Retargeting a C Compiler for a DSP Processor

Size: px
Start display at page:

Download "Retargeting a C Compiler for a DSP Processor"

Transcription

1 Retargeting a C Compiler for a DSP Processor Master thesis performed in electronics systems by Henrik Antelius LiTH-ISY-EX Linköping 2004

2

3 Retargeting a C Compiler for a DSP Processor Master thesis in electronics systems at Linköping Institute of Technology by Henrik Antelius LiTH-ISY-EX Supervisors: Thomas Johansson Ulrik Lindblad Patrik Thalin Examiner: Kent Palmkvist Linköping,

4

5 Avdelning, Institution Division, Department Datum Date Institutionen för systemteknik LINKÖPING Språk Language Svenska/Swedish X Engelska/English Rapporttyp Report category Licentiatavhandling X Examensarbete C-uppsats D-uppsats Övrig rapport ISBN ISRN LITH-ISY-EX Serietitel och serienummer Title of series, numbering ISSN URL för elektronisk version Titel Title Anpassning av en C-kompilator för kodgenerering till en DSP-processor Retargeting a C Compiler for a DSP Processor Författare Author Henrik Antelius Sammanfattning Abstract The purpose of this thesis is to retarget a C compiler for a DSP processor. Developing a new compiler from scratch is a major task. Instead, modifying an existing compiler so that it generates code for another target is a common way to develop compilers for new processors. This is called retargeting. This thesis describes how this was done with the LCC C compiler for the Motorola DSP56002 processor. Nyckelord Keyword retarget, compiler, LCC, DSP

6

7 Abstract The purpose of this thesis is to retarget a C compiler for a DSP processor. Developing a new compiler from scratch is a major task. Instead, modifying an existing compiler so that it generates code for another target is a common way to develop compilers for new processors. This is called retargeting. This thesis describes how this was done with the LCC C compiler for the Motorola DSP56002 processor.

8

9 Table of contents 1 Introduction Background Purpose and goal The reader Reading guidelines DSP Introduction Motorola DSP Data buses Address buses Data ALU Address generation unit Program control unit Instruction set Assembly Compilers Introduction The analysis-synthesis model Phases Analysis Lexical analysis Syntax analysis Semantic analysis Synthesis Intermediate code generation Code optimization Code generation Symbol table Error handler Front and back end Environment ix

10 3.9.1 Preprocessor Assembler Linker and loader Compiler tools LCC Introduction C The compiler Lexical analysis Syntax analysis Semantic analysis Intermediate code generation Back end Implementation Introduction The compiler Data types and sizes Register usage Memory usage Frame layout Calling convention Naming convention Retargeting Configuration Declarations Rules C code Special features Other changes to LCC The environment crt Problems Register targeting bit registers Address registers Improvements Conclusions Retargeting Future work References 53 x

11 Table of contents Appendix A: Instructions 55 A.1 Arithmetic instructions A.2 Logical instructions A.3 Bit manipulation instructions A.4 Loop instructions A.5 Move instructions A.6 Program control instructions Appendix B: Sample code 59 B.1 sample.c B.2 sample.asm Appendix C: dsp56k.md 61 Index 79 xi

12 xii

13 1 Introduction 1.1 Background The division of Electronics Systems (ES) at the department of Electrical Engineering (ISY) at Linköping University (LiU) is currently running a project aiming at developing a DSP processor. The goal of this project is to make a DSP with a scalable structure that is instruction level compatible with the Motorola DSP56002 processor. The scalability refers to variable data word length and addition or removal of memories and instructions. The goal with scalability is to reduce the power consumption. Currently this project is nearly finished. In order to increase the usability of the DSP a C compiler is needed. It was decided that the best way to create a C compiler was to retarget an existing C compiler. Creating a compiler from scratch is a big undertaking that requires a lot of work. Retargeting a compiler is a relatively easy task compared to developing an entire compiler. 1.2 Purpose and goal The purpose of this thesis is to retarget a C compiler to the Motorola DSP56002 processor. The resulting compiler should from one or more C source files produce an executable file that can execute on the DSP. The only requirement on the compiler is that it should generate code 1

14 1.3 The reader that works correctly and functions as intended. There are no requirements on the performance or the size of the generated code. The compiler should also be compatible with Motorola s C compiler and tools for the DSP This makes it possible to mix generated code from the two compilers. It also means that the tools from Motorola can be used for the new compiler. 1.3 The reader It is assumed that the reader of this thesis has basic knowledge of the C programming language and some knowledge of assembly language. It is also assumed that the reader has a general knowledge of how processors work and what function a compiler has. 1.4 Reading guidelines This is a brief description of the chapters: Chapter 1 contains an introduction and states the purpose of the thesis. Chapter 2 describes how the DSP56002 processor works and how it can be used. Chapter 3 contains general compiler theory that is needed to understand how a compiler works. Chapter 4 describes the compiler LCC that was used in this thesis. Chapter 5 describes the implementation and modifications that were done to LCC. Chapter 6 lists the conclusions that were made and suggests further work. 2

15 2 DSP This chapter contains a description of how the Motorola DSP56002 processor works. This information is collected from [4]. 2.1 Introduction Digital signal processing is, as the term suggests, the processing of signals by digital means. The signal is normally an electrical signal carried on a wire, but it can represent almost any kind of information and it can be processed in a wide variety of ways. Examples of digital signal processing include the following: Filtering of signals. Convolution, which is the mixing of two signals. Correlation, which is the comparison of two signals. Rectification, amplification and transformation of a signal. All of these tasks have earlier been performed by using analog circuits. Nowadays integrated circuits have enough processing power to perform these and many other functions. The devices performing these tasks are called digital signal processors, or DSPs. They are specialised microprocessors with architectures designed specifically for the types of operations required in digital signal processing. Like general-purpose microprocessors, the DSPs are programmable devices with its own native instruction set. 3

16 2.2 Motorola DSP56002 DSPs can today be found in almost all electronic areas, such as mobile phones, personal computers, digital television decoders, surround receivers, and so on. The advantages of using a DSP instead of analog circuits are many. Generally, fewer components are needed, DSPs have higher noise immunity, it is easy to change the behaviour of a filter, filters with closer tolerances can be built, and so on. Also, since the DSP is a microcomputer, the same hardware design can be used in many different areas by simply changing the software for the DSP. 2.2 Motorola DSP56002 The Motorola DSP56002 is a general purpose DSP processor with a triple-bus Harvard architecture. This architecture can access multiple memories at the same time. It uses fixed-point arithmetic and has three function units; data arithmetic and logic unit (Data ALU), address generation unit (AGU) and program control unit (PCU). It does also have three memories, two for data (X and Y) and one for the program (P). A block diagram of the DSP56002 can bee seen in Figure 2.1. Figure 2.1: Block diagram of the Motorola DSP

17 Chapter 2 DSP This architecture with multiple memories and buses makes it possible to, during one instruction cycle, make one computation in the data ALU while accessing the X and Y memories at the same time Data buses The data buses consists of four 24-bit wide buses called the Y data bus (y_dbus), the X data bus (x_dbus), the program data bus (p_dbus) and the global data bus (g_dbus). They are used for moving data between the function units and the memories. Data transfers between the data ALU and the X and Y memories occur over the X and Y data buses, respectively. All other data movements occur over the global data bus and instruction fetches occurs over the program data bus Address buses Addresses for the X data memory and the Y data memory are specified over the X address bus (x_abus) and the Y address bus (y_abus). Addresses for the program memory are specified over the P address bus (p_abus). All address buses are 16-bit wide Data ALU The data ALU performs all of the arithmetic and logical operations on the data. It uses a register set that consists of four 24-bit input registers, two 48-bit accumulator registers and two 8-bit accumulator extension registers. The input registers are called X0, X1, Y0 and Y1. They can also be combined into two 48-bit registers called X and Y. The two accumulators are called A and B and are 56 bits wide. Each consists of three concatenated registers, A2:A1:A0 and B2:B1:B0. The A2 and B2 are the 8-bit accumulator extension registers and they are used when more than 48-bit accuracy is needed. The input registers are used for operands to the instructions and the accumulator registers are used for both operands and the result from instructions Address generation unit The AGU performs all of the address storage and address calculations necessary to access the data in the memories. The AGU is divided into two identical halves, each of which has an address arithmetic unit that can generate one address each instruction cycle. The AGU has three sets of eight registers. They are the address registers R0 R7, the offset 5

18 2.3 Instruction set registers N0 N7 and the modifier registers M0 M7. The R-registers are used for storing addresses that are used to address the memories. The N- and M-registers are used to update the R-registers in various ways. The registers are connected. So, for example, only N1 and M1 can be used to update R Program control unit The PCU performs instruction prefetch, instruction decoding, hardware loop control and interrupt processing. It contains a 15-level system stack that is 32 bits wide and the following six registers: program counter (PC), loop address (LA), loop counter (LC), status register (SR), operating mode register (OMR) and stack pointer (SP). 2.3 Instruction set The instruction set can be seen in Appendix A on page 55. About half of the available instructions allow the use of parallel data moves. 2.4 Assembly The instruction syntax is organized in four columns; opcode, operands and two parallel move fields. An example of a typical assembly instruction can be seen here: Opcode Operands XDB YDB MAC X0,Y0,A X:(R0)+,X0 Y:(R4)+,Y0 The opcode column specifies the operation that should be performed. The operands column specifies which operands the opcode should use. The XDB and YDB columns specify optional data transfers over the X data bus and the Y data bus. The address space qualifiers X: and Y: indicate which memory is being referenced. This is an example of a small assembly program: ORG Y: var_a dc 42 var_b dc 48 ORG MOVE MOVE P:$40 Y:var_a,X0 Y:var_b,A 6

19 ADD MOVE X0,A A,Y:var_a Chapter 2 DSP This program simply adds the variables var_a and var_b and stores the result in var_a. This is a list of some of the features of the assembler that is used in this thesis: Labels: If the first character on a line is not a space or a tab it is a label. Labels are used for variables and jump destinations. A colon is often used to end the label to increase readability of the assembly. ORG: The ORG directive is used to indicate which memory the following statements belong to. It is also used for a lot of other memory related things. OPT: The OPT directive is used to assign options to the assembler. Variables: Variables are declared with a label and the DC directive to define a constant. GLOBAL: The GLOBAL keyword is used to instruct the assembler that a variable is global. Comments: Semicolon is used as a comment specifier. All characters to the right of the semicolon are ignored. 7

20 2.4 Assembly 8

21 3 Compilers This chapter contains general compiler theory. Most of the information is collected from [1]. 3.1 Introduction A compiler is a program that reads a program written in one language and translates it into an equivalent program in another language. An important part of this process is to report the presence of errors in the source program to the user. There exists thousands of different compilers for different source languages and target languages, and there also exists many different types of compilers. However, the basic principles of how the compilers work are the same. This chapter will discuss these basic principles. 3.2 The analysis-synthesis model There are two parts to compilation: analysis and synthesis. The analysis part breaks up the source program into consecutive pieces and creates an intermediate representation of the source program. The synthesis part constructs the desired target program from the intermediate representation. 9

22 3.3 Phases During analysis the operations stated in the source program are determined and recorded in a hierarchical structure called a tree. Often a special kind of tree called a syntax tree is used. In the synthesis part of the compilation the output is generated from the contents of the syntax tree. There is often also some sort of optimization of the generated source in this part. 3.3 Phases A compiler operates in phases, each of which transforms the source program from one representation to another. A typical decomposition of a compiler is shown in Figure 3.1. The following sections will discuss the different phases and how they are connected. 10 Figure 3.1: Phases of a compiler

23 3.4 Analysis Chapter 3 Compilers The analysis consists of three phases: lexical analysis, syntax analysis and semantic analysis Lexical analysis Lexical analysis, sometimes called scanning, is where the stream of characters that make up the source program is scanned left-to-right and transformed into groups of characters called tokens. For example, the characters in the statement result = start + rate * 60 would be transformed into the following tokens: 1. The identifier result. 2. The assignment symbol =. 3. The identifier start. 4. The plus sign. 5. The identifier rate. 6. The multiplication sign. 7. The number 60. The white space is normally eliminated during lexical analysis Syntax analysis Syntax analysis, or parsing, is where the tokens of the source program is grouped into grammatical phrases. Usually the phrases of the source program is represented by a parse tree. An example of a parse tree can be seen in Figure

24 3.4 Analysis Figure 3.2: Parse tree for the statement result=start+rate*60 The phrase rate*60 will be grouped together because the rules of arithmetic expressions state that multiplication is performed before addition. Context free grammars The rules for the syntax analysis is often expressed by context free grammars. The grammar gives a precise and easy to understand specification of the syntax of the programming language. It is also possible to construct a parser from a grammar by using automated tools. For example, an if-else statement in C has the form: if ( expression ) statement else statement The statement is the concatenation of the keyword if, an opening parenthesis, an expression, a closing parenthesis, a statement, the keyword else, and another statement. Using the variable expr for expression and stmt for statement, this rule can be expressed as: stmt if ( expr ) stmt else stmt The arrow may be read as can have the form. This kind of rule is called a production. In a production lexical elements like the keyword if and the parenthesis are called tokens. Variables like expr and stmt represent sequences of tokens and are called nonterminals. A context free grammar has four components: 1. A set of tokens, known as terminal symbols. 2. A set of nonterminals. 12

25 Chapter 3 Compilers 3. A set of productions where each production consists of a nonterminal, an arrow, and a sequence of tokens and/or nonterminals. 4. A designation of one of the nonterminals as the start symbol. The following is an example of a simple grammar that can parse the right hand side of the assignment statement in Figure 3.2: expr identifier expr number expr expr + expr expr * expr The symbol is used to separate multiple productions on one line and can be read as or. By using expr as the start symbol the derivation of the right hand side of the assignment statement could look like this: expr expr + expr identifier + expr identifier + expr * expr identifier + identifier * expr identifier + identifier * number A grammar derives strings by beginning with the start symbol and repeatedly replacing a nonterminal by the right side of the production for that nonterminal. The set of token strings that can be derived from the start symbol form the language defined by the grammar. Syntax tree A more common internal representation of the syntactic structure is the syntax tree. It is a compressed representation of the parse tree where the operators appear as the nodes, and the operands of an operator are the children for that node. An example of a syntax tree is seen in Figure 3.3. Figure 3.3: Syntax tree for the statement result=start+rate*60 13

26 3.4 Analysis Semantic analysis The semantic analysis phase checks the source program for semantic errors and gather type information for the code generation phase. It uses the hierarchical structure generated in the syntax analysis phase to identify the operators and operands of expressions and statements. This checking ensures that certain kinds of programming errors will be detected and reported. Examples of semantic checks can be: Type checks: The compiler should report an error if an operator is applied to an incompatible operand. For example, if an integer variable is added to a function. It can also check that parameters to functions are correct in type and number. Flow-of-control checks: Statements that causes the flow of control to leave a construct must have some place to which to transfer the flow of control. For example, a break statement in C causes the flow of control to leave the enclosing while, for or switch statement. If break is used outside of one of those an error is generated. Uniqueness checks: Sometimes an object can only be defined once. For example, the case labels in a switch statement in C must be unique, and variables with the same name in the same scope is not permitted. Name related checks: Sometimes the same name must appear at multiple locations. For example, in ADA a loop or block may have a name that appears at the beginning and at the end of the construct. There are many more different types of checks that can be needed to be performed depending on the language. In C for example, functions and variables must be declared before they are used, something that is not necessary in some languages. The type checking does not always have to result in an error. For example, a type mismatch can sometimes be resolved by converting the operand. If a in the statement a = a * 2; is a floating point number, the integer 2 must be converted to a floating point number before the multiplication can take place. This is accomplished by inserting a new node that explicitly converts an integer to a floating point number in the syntax tree. 14

27 Chapter 3 Compilers Since programming languages are so different and the semantic checks needed by the languages are so different there is no systematic way perform the semantic checks. It is usually done by traversing the tree and examining the nodes or during the syntax analysis phase. 3.5 Synthesis The synthesis consists of the three phases intermediate code generation, code optimizer and code generator. It is responsible for transforming the source that is now in the form of a syntax tree to the output language Intermediate code generation After the syntax and semantic analysis some compilers generate a machine independent intermediate form of the source program. Although the source program can be translated directly to the target language from the syntax tree, there are some benefits of using an intermediate form: A machine independent optimizer can be used on the intermediate representation. Retargeting is made easier. Creating a compiler for a different machine can be done by replacing a smaller part of the compiler than would have otherwise been necessary. The intermediate representation should have two important properties; it should be easy to generate and it should be easy to transform into the target program. A common way to solve this is to use a so called three-address code. It is very similar to an assembly language where each memory location can be used as a register. The code consists of a sequence of instructions, each of which can have at most three operands. For example, the assignment statement from Figure 3.2 might look like this: temp1 = rate * 60 temp2 = temp1 + start result = temp2 There are also statements for conditional and unconditional jumps, procedure calls, return statements, indexed assignment to be used on arrays, and address and pointer assignments. The instruction set of three-address codes must be large enough to implement the operations in the source language, but a smaller 15

28 3.5 Synthesis instruction set is easier to implement and retarget. However, if it is too small the intermediate code generator can be forced to generate long sequences of statements for some source language operations. It will then be more difficult for the optimizer and the code generator to produce good code Code optimization The code optimizer will attempt to improve the intermediate code so that faster running machine code will be generated. It can sometimes also be of interest to make the code smaller. For DSP processors code with lower power consumption is sometimes preferred. There are two types of optimizations that can be done; machine independent and machine dependent. Machine independent optimizations are typically done using the intermediate form as the base and does not consider any details of the target architecture when making optimization decisions. It is often very general in nature. Machine dependent optimizations can be done both on the intermediate form and the generated code. These optimizations consider the target architecture specifically and uses special instructions such as hardware loops and so on. There are a number of common optimization techniques. Constant propagation Constant propagation is simply the replacement of variable references with constant references when possible. For example, the statement becomes a = 3; function_call(a + 42); function_call(3 + 42); Constant folding Expressions with constant operands can be calculated at compile time. The example above would be transformed to function_call(45); Programmers usually do not write expressions such as 3+42 directly, but these expressions are quite common after macro expansion and other optimizations such as constant propagation. 16

29 Chapter 3 Compilers Common subexpression elimination A common subexpression, or CSE, is created when two or more expressions compute the same value. The expression is calculated once to a temporary variable that is used instead of the CSE. For example, the statement array1[i + 1] = array2[i + 1]; will be transformed to temp1 = i + 1; array1[temp1] = array2[temp1]; Dead code elimination Code that is never reached or that does not affect the program can be eliminated. For example, this code fragment int global; void foo(void){ int k = 1; global = 1; global = 2; will transform into the following int global; void foo(void){ global = 2; Expression simplification Some expressions can be simplified by replacing them with a more efficient expression. For example, i+0 will be replaced by i, i*0 and i-i by 0, and so on. Code motion Expressions in a loop that gives the same result each time the loop is iterated can be moved outside the loop and calculated only once before entering the loop. 17

30 3.6 Symbol table Strength reduction Strength reduction replaces expensive instructions with less expensive instructions. For instance, a popular strength reduction is to replace a multiplication by a constant power of two with a left shift Code generation The final phase of the compiler is the generation of target code. The target code is usually relocatable machine code or assembly code. Memory locations are selected for each of the variables used in the source program and the intermediate instructions are translated into one or more assembly level instructions that perform the same task. A vital part of code generation is the assignment of registers to variables, since that can greatly affect the performance of the generated code. Using the example from the previous sections, the generated code might look like this MOVE MUL MOV ADD MOVE rate, R1 #60, R1 start, R2 R1, R2 R2, result 3.6 Symbol table An essential part of the compiler is to keep track of the identifiers used in the source program and to collect information about various attributes of each identifier. These attributes contains information about the name and type of the identifier, its size, scope and so on. For functions and procedures it also contains the number and types of its arguments and the return type. In a similar way it works for more complex data types like arrays and structures. The symbol table is a data structure that contains a record for each identifier and fields for the attributes of the identifier. The data structure makes it possible to search for identifiers and add or retrieve the attributes and to add new identifiers. When an identifier is found in the lexical analysis its name is added to the symbol table if its not already there. The index in the symbol table is then passed along in the token and that index is used to refer to the identifier from there on. In the later phases of compilation information 18

31 Chapter 3 Compilers about the type and other attributes are added and is used in various ways. 3.7 Error handler It is important that the compiler can detect errors and deal with them in a reasonable way. When an error is encountered the compiler emits an error message containing the location of the error in the source program and a message stating the type of error and it then tries to continue with the compilation. It can sometimes be difficult for the compiler to know what to do to continue when an error has been detected. One way is to, for example, skip all input until the next semicolon to get to the next statement. As soon as the error count is greater than zero a flag is set and the compiler will stop execution after the semantic analyzer phase. There is no point in generating the target program when there exists errors in the source program. The compiler can also detect minor errors that will not stop the compilation and emit warnings about these errors instead. 3.8 Front and back end Often the phases are collected into a front end and a back end. The front end consists of the phases that depend on source language and are largely independent of the target machine. These normally include lexical analysis, syntax analysis, semantic analysis and the generation of intermediate code. The machine independent optimizations can also be done in the front end. The creation of the symbol table and most of the error handling is also done in the front end. The back end includes the parts of the compiler that are dependant on the target machine, and these parts usually does not depend on the source language, only the intermediate code. The back end therefore consists of the code optimizer and the code generator. It also uses the symbol table and error handler. This division of the design makes it easy to take the front end of a compiler and combine it with a new back end to produce a compiler for the same source language to a different target machine. This is called retargeting. 19

32 3.9 Environment 3.9 Environment In addition to the compiler, several other programs are required if an executable program is to be created. See Figure 3.4. Figure 3.4: A compiler system Preprocessor The preprocessor produce the input to the compiler. It often performs different kinds of text processing, for example macro processing and file inclusion. In C for example, all lines beginning with a # is an instruction to the preprocessor. #define FAIL -1 causes all occurrences of FAIL to be replaced by -1, and #include <file.h> will include the file file.h in the source program. In C the preprocessor also removes the comments from the source program Assembler Some compilers produce assembly code, and that must be passed to an assembler for further processing. The assembler works much like a compiler and translates the assembly source to relocatable machine code. 20

33 Chapter 3 Compilers Linker and loader The linker makes it possible to combine several relocatable machine code files into a single program. The different machine code files can be the result from several compilations, and some may be library files. The linker resolves external references in the input files so that data and functions from the different files can be used by each other. When all the external references are resolved the loader takes the relocatable machine code and alters all relocatable addresses to real addresses and places the code and the data in its proper locations and creates the output file Compiler tools Since most compilers use the same structure and function in the same way, specialized tools have been developed that helps implement the various components of the compiler. These tools use specialized languages for specifying and implementing the components, and many use algorithms that are quite sophisticated. The following is a list of some compiler construction tools: Scanner generators: These automatically generate lexical analyzers. Usually from a specification based on regular expressions. Examples include flex and lex. Parser generators: These produce syntax analyzers from specifications that is normally based on a context free grammar. Before the parser generators appeared, the parser was the most time consuming part to implement. Now it is considered to be one of the easiest to implement thanks to the parser generators. Examples of parser generators are yacc and bison. Syntax-directed translation engines: These produce routines that walk the parse tree and generates intermediate code. Automatic code generators: These tools generates routines that translates the intermediate language into the machine language for the target machine by the help of a collection of rules. The basic technique is template matching. The intermediate code statements are replaced by templates that represent sequences of machine instructions. 21

34 3.10 Compiler tools 22

35 4 LCC This chapter describes how the compiler LCC works. Most of this information is collected from [2] and [3]. 4.1 Introduction LCC is a free ANSI C compiler that is designed to be retargetable. The source code is available for download from the internet [6] under a license [7] that imposes almost no restrictions at all. This compiler was chosen because it is very small and simple. It is designed in a way so that it is easy to retarget it to generate code for other processors. There is also excellent documentation of LCC in the form of a book that describes every detail of the implementation of the entire compiler. It is called A Retargetable C Compiler: Design and Implementation and it was used extensively during this thesis. The thesis could probably not have been completed without the book. Another compiler candidate was the GNU C Compiler, or GCC, from the GNU Compiler Collection, which is an open source C compiler. GCC would probably have generated better and faster code, but it was not chosen because it is much bigger and more complex than LCC. Also, the same kind of documentation that was available for LCC was not available for GCC. 23

36 4.2 C 4.2 C C is a general purpose programming language that was developed during the 1970 s by Brian Kernighan and Dennis Ritchie and it is still widely used today. It is a relatively low level language where the basic data types in the language correspond to real data types found in the hardware. The language provides no operations to deal directly with composite data types, such as strings, arrays, lists and so on. There are no input/output facilities and no file access facilities. All these higher level operations must be provided by library functions. This, and several other limitations, has some advantages. It makes the language small and relatively easy to learn. It does also mean that compilers for the language will be smaller and easier to construct. C has become very popular and there exists compilers for many different processors and operating systems. Although it is far from an ideal language for DSP processors, it is still extensively used for them. That is probably because it is such a simple and low level language, which makes it easier to construct a compiler that generates efficient code for the DSP processors. Over the years the C programming language has evolved and been standardized a couple of times. The first version, called K&R C (from Kernighan and Ritchie), is derived from the reference manual in the first edition of the book The C programming language by Brian Kernighan and Dennis Ritchie. In 1989 ANSI standardized the language, and it is commonly referred to as ANSI C or C89. ISO has released two standards for C, and they are called ISO C90 and ISO C The compiler The following sections will describe the different phases of the compiler and how they work Lexical analysis The lexical analyzer reads source text and produces tokens. For each token the lexical analyzer returns its token code and zero or more associated values. The token codes for single character tokens, for example = and +, are the characters themselves. For tokens that can consist of one or more characters, for example identifiers and constants, defined constants are used. For example, the expression ptr = 42 results in the following token stream 24

37 ID "ptr" symbol table entry for ptr '=' ICON "42" symbol table entry for 42 Chapter 4 LCC The token code for the operator = is the numeric value of =, and it does not have any associated values. The token code for the identifier ptr is the value of the constant ID, and the associated values are the identifier string itself and a pointer to the symbol table entry for the identifier. The integer constant 42 returns the token ICON and the associated values "42" and a pointer to the symbol table. Keywords, such as for and switch, have their own token codes to distinguish them from identifiers. The lexical analyzer also tracks the source coordinates for each token. These coordinates contains the file name, line number and position on the line of the first character of the token. The coordinates are used to locate errors when they are found. Recognizing tokens The lexical analyzer in LCC is written by hand, it is not generated by a tool. This is due to the fact that the lexical structure in C is simple and that generated analyzers tend to be large and slow. The lexical analyzer is used by calling the function gettok(), which returns the next token. The gettok() function recognizes a token by using a switch statement on the first character in the token to classify it. It then consumes the following characters that make up the token. The following is a small sample of the code... switch (*rcp++) {... case '<': if (*rcp == '=') return cp++, LEQ; if (*rcp == '<') return cp++, LSHIFT; return '<';... rcp and cp are pointers to the next character in the input file. The code for identifying most of the tokens looks very similar to the example, but identifying numbers, strings and identifiers is a bit harder. However, it works in the same way by looking ahead in the input stream. 25

38 4.3 The compiler Syntax analysis The syntax analyzer, or parser, uses the stream of tokens from the lexical analyzer and confirms that it follows the syntax of the language. It also builds an internal representation of the input that is used by the rest of the compiler. The parser for LCC is also written by hand. The reason for this the same as for the lexical analyzer; C is a simple language and the code generated by tools is slow and big. Grammar LCC uses a context free grammar written in EBNF form to define the rules for the parser. The parser is constructed by writing a parsing function for each nonterminal. The idea is to write a function X() for each nonterminal X, using the productions for X as a guide to writing the code for X(). For example, the parsing function for the following production expr term { + term will look like void expr(void){ term(); while(t == '+'){ t = gettok(); term(); The { and in the production is an EBNF feature that means zero or more. Abstract syntax tree When parsing the program the compiler also generates an intermediate representation of the program. This is done in the form of abstract syntax trees, or simply trees. Abstract syntax trees are parse trees without the nodes for nonterminals and nodes for useless terminals. For example, the tree for the expression (a+b)*c can be seen in Figure

39 Chapter 4 LCC Figure 4.1: Tree for the expression (a+b)*c There are no nodes for the nonterminals used when parsing this expression, and there are no nodes for the tokens ( and ). The tokens + and * are contained in the nodes ADD+I and MUL+I. The nodes with the operator ADDRG+P compute the address of the operand and INDIR+I fetches integers at the address given by their operand. The name of the nodes are constructed by an operator and a type suffix that denotes the type that the operator operates on. For example, the node ADD+I states that the node uses integer addition. Table 4.1 lists the different type suffixes available. Type suffix F I U P V B Meaning Table 4.1: Type suffixes Floating point Integer Unsigned Pointer Void Structure The trees can contain operators that do not appear in the source program. For example, the INDIR+I node fetches integers at an address, but there is no fetch operator in C. A list of operators that can appear in the trees is seen in Table 4.2. In addition to these, there are six more operators that are used in trees listed in Table

40 4.3 The compiler Operator Type suffix Operation ADDRF...P.. Address of a parameter ADDRG...P.. Address of a global ADDRL...P.. Address of a local CNST FIUP.. Constant BCOM.IU... Bitwise complement CVF FI... Convert from float CVI FIU... Convert from signed integer CVP..U... Convert from pointer CVU.IUP.. Convert from unsigned integer INDIR FIUP.B Fetch NEG FI... Negation ADD FIUP.. Addition BAND.IU... Bitwise AND BOR.IU... Bitwise inclusive OR BXOR.IU... Bitwise exclusive OR DIV FIU... Division LSH.IU... Left shift MOD.IU... Modulus MUL FIU... Multiplication RSH.IU... Right shift SUB FIUP.. Subtraction ASGN FIUP.B Assignment EQ FIU... Jump if equal GE FIU... Jump if greater than or equal GT FIU... Jump if greater than LE FIU... Jump if less than or equal LT FIU... Jump if less than NE FIU... Jump if not equal ARG FIUP.B Argument CALL FIUPVB Function call RET FIUPV. Function return JUMP...V. Unconditional jump LABEL...V. Label definition Table 4.2: Node operators 28

41 Chapter 4 LCC Operator AND OR NOT COND RIGHT FIELD Operation Logical AND Logical OR Logical NOT Conditional expression Composition Bit-field access Table 4.3: Tree operators Semantic analysis The semantic analysis of the source program is done when the parser recognizes the input, so there is therefore no explicit phase in the compilation where this is done. Each parsing function detects and handles the semantic errors according to the semantics of each construct. When for example a type conversion is needed an extra convert node is inserted in the abstract syntax tree, and the expression x = 6 generates an error if x is not defined. There are a lot of other semantic checks that are also being done Intermediate code generation During this stage the compiler produces directed acyclic graphs, or dags, from the trees. The compiler also eliminates common subexpressions. For example, in the expression (a+b)+b*(a+b) the value of a+b is computed twice. The dag for this expression can be seen in Figure 4.2. The multiplication node (MULI4) uses the already computed values for a+b and b instead of computing them again. The names of the nodes in dags are made up of a generic operator, a type suffix and a size indicator. The + is omitted to distinguish dags from trees. For example, ADDI4 denotes a 4-byte (32-bit) integer addition. Figure 4.2: The dag for (a+b)+b*(a+b) 29

42 4.3 The compiler Trees contain operators that are not allowed for dags. The available operators for dags are seen in Table 4.2. When the dags are constructed the operators that are not allowed are replaced by other operators instead. For example, the operator AND is replaced by a comparison and jumps and labels. Before the dags are passed to the back end they may be converted to trees again. Some back ends wants trees and some wants dags. All back ends that are included in the LCC distribution wants trees. When the conversion is done nodes that are referenced multiple times because of the common sub expression optimization are changed. The result of the common subexpression is stored in a temporary variable that is used instead. The resulting tree is still using the same data structures and representation as the dags though Back end LCC's back end is divided in a machine independent part and in a machine dependent part. The front end communicates with the back end by calling a number of interface functions. In a C program, all program code is contained in functions. To generate code for a function the front end calls the interface function function(). function() uses two functions to generate code, gencode() and emitcode(). gencode() selects and orders instructions and allocates registers. emitcode() emits the assembler code for the function and also removes unnecessary register to register copies. These register to register copies are left over from earlier optimizations and it is easier to remove them here. Selecting instructions The instruction selection is done in the function gencode(). The instruction selectors used by LCC are generated automatically from a specification by a program called lburg. lburg is a code generator generator and it emits a tree parser written in C. The core of an lburg specification is a tree grammar, which is a list of rules where each rule has a nonterminal on the left and a pattern of terminals and nonterminals on the right. For example, the rule addr: ADDI4(reg, con) matches a tree at an ADDI4 node if the node s first child recursively matches the nonterminal reg and the second child recursively matches 30

43 Chapter 4 LCC the nonterminal con. In Figure 4.3 the tree with the selected rules for the statement i = c + 2 can be seen. Figure 4.3: Tree with rules Tree grammars are usually ambiguous, which means that there can be more than one selection of instructions that do the same thing. For example, increasing a register by one can be done by adding one to the register directly or by loading one into another register and adding the two registers. The cheapest implementation is preferred, so a cost is assigned to each rule and the parse tree with the lowest total cost is selected. Specifications lburg specifications uses the following format %{ % %% %% configuration declarations rules C code The configuration part is C code and is optional. It is copied directly into the generated file. The same applies to the C code part. The declarations part contains the start symbol and a list of all the terminals. The rules part contains tree patterns. Each rule has an assembler code template, which is a quoted string that specifies what to emit when the rule is used. Rules end with an optional cost. The following is an example of a simple specification 31

44 4.3 The compiler %start stmt %term ADDI4=309 ADDRLP1=295 ASGNI4=53 %term CNSTI4=21 INDIRI4=67 %% con: CNSTI4 "1" addr: ADDRLP1 "2" addr: ADDI4(reg, con) "3" rc: con "4" rc: reg "5" reg: ADDI4(reg, rc) "6" 1 reg: addr "7" 1 stmt: ASGNI4(addr, reg) "8" 1 In this example the assembler code templates are simply rule numbers. Rule 1 states that con matches constants. Rule 2 and 3 states that addr matches trees that can be computed by address calculations, like an ADDRLP1 or the sum of a register and a constant. rc matches a constant or a reg, and reg matches any tree that can be computed into a register. Rule 6 describes an add instruction. The first operand must be in a register and the second operand can be a register or a constant. The result is stored in a register. Rule 7 describes an instruction that loads an address into a register. Rule 8 describes an instruction that stores a register at an address. The emitter The emitter in the function emitcode() is what outputs the assembler code from the assembler templates. Each rule has one assembler template. If the template ends with a newline character, lburg assumes that it is an instruction, otherwise it is assumed to be a piece of an instruction. When the emitter emits the template it treats some characters differently. %digit tells the emitter to emit the digit-th nonterminal from the pattern. %c emits the nonterminal on the left side of the production. For example, the rule areg: ADDI4(reg, rc) "add %c,%0,%1" might be emitted as add a1,r1,#60 If the template begins with #, emit2() is called to emit the instruction. This is needed to deal with tricky features in some assemblers. 32

45 5 Implementation 5.1 Introduction The main goal of this thesis was the design and implementation of a new back end to the LCC compiler for the DSP56002 processor. One other goal was to maintain compatibility with Motorola s C compiler, so that the generated code would behave in the same way. This means that the two compilers use the registers in the same way, uses the same memory layout, uses the same calling convention, and so on. By doing this, code generated by Motorola s compiler can use code compiled by this compiler, libraries for example, and vice versa. LCC is designed so that retargeting should be as easy as possible, and the included backs ends only consist of about 1000 lines of code each. This chapter will describe how the back end was constructed and why it looks and behaves as it does. 5.2 The compiler The DSP56002 digital signal processor is designed to execute DSP oriented calculations as fast as possible. As a consequence, it has an architecture that is somewhat unconventional for the C language. Because of this there are characteristics of the compiler and the generated code that are a bit unusual and will be documented here. Since this compiler should be compatible with Motorola s compiler this section is based on information from [5]. 33

Intermediate Code Generation

Intermediate Code Generation Intermediate Code Generation In the analysis-synthesis model of a compiler, the front end analyzes a source program and creates an intermediate representation, from which the back end generates target

More information

A Simple Syntax-Directed Translator

A Simple Syntax-Directed Translator Chapter 2 A Simple Syntax-Directed Translator 1-1 Introduction The analysis phase of a compiler breaks up a source program into constituent pieces and produces an internal representation for it, called

More information

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES (2018-19) (ODD) Code Optimization Prof. Jonita Roman Date: 30/06/2018 Time: 9:45 to 10:45 Venue: MCA

More information

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS Objective PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS Explain what is meant by compiler. Explain how the compiler works. Describe various analysis of the source program. Describe the

More information

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

CD Assignment I. 1. Explain the various phases of the compiler with a simple example. CD Assignment I 1. Explain the various phases of the compiler with a simple example. The compilation process is a sequence of various phases. Each phase takes input from the previous, and passes the output

More information

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100 GATE- 2016-17 Postal Correspondence 1 Compiler Design Computer Science & Information Technology (CS) 20 Rank under AIR 100 Postal Correspondence Examination Oriented Theory, Practice Set Key concepts,

More information

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design i About the Tutorial A compiler translates the codes written in one language to some other language without changing the meaning of the program. It is also expected that a compiler should make the target

More information

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING PRINCIPLES OF COMPILER DESIGN 2 MARKS UNIT I INTRODUCTION TO COMPILING 1. Define compiler? A compiler is a program that reads a program written in one language (source language) and translates it into

More information

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler. More detailed overview of compiler front end Structure of a compiler Today we ll take a quick look at typical parts of a compiler. This is to give a feeling for the overall structure. source program lexical

More information

UNIT-II. Part-2: CENTRAL PROCESSING UNIT

UNIT-II. Part-2: CENTRAL PROCESSING UNIT Page1 UNIT-II Part-2: CENTRAL PROCESSING UNIT Stack Organization Instruction Formats Addressing Modes Data Transfer And Manipulation Program Control Reduced Instruction Set Computer (RISC) Introduction:

More information

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! Any questions about the syllabus?! Course Material available at www.cs.unic.ac.cy/ioanna! Next time reading assignment [ALSU07]

More information

COMPILER DESIGN. For COMPUTER SCIENCE

COMPILER DESIGN. For COMPUTER SCIENCE COMPILER DESIGN For COMPUTER SCIENCE . COMPILER DESIGN SYLLABUS Lexical analysis, parsing, syntax-directed translation. Runtime environments. Intermediate code generation. ANALYSIS OF GATE PAPERS Exam

More information

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design PSD3A Principles of Compiler Design Unit : I-V 1 UNIT I - SYLLABUS Compiler Assembler Language Processing System Phases of Compiler Lexical Analyser Finite Automata NFA DFA Compiler Tools 2 Compiler -

More information

Compiler Code Generation COMP360

Compiler Code Generation COMP360 Compiler Code Generation COMP360 Students who acquire large debts putting themselves through school are unlikely to think about changing society. When you trap people in a system of debt, they can t afford

More information

VIVA QUESTIONS WITH ANSWERS

VIVA QUESTIONS WITH ANSWERS VIVA QUESTIONS WITH ANSWERS 1. What is a compiler? A compiler is a program that reads a program written in one language the source language and translates it into an equivalent program in another language-the

More information

Principles of Compiler Design

Principles of Compiler Design Principles of Compiler Design Code Generation Compiler Lexical Analysis Syntax Analysis Semantic Analysis Source Program Token stream Abstract Syntax tree Intermediate Code Code Generation Target Program

More information

Compiler Design (40-414)

Compiler Design (40-414) Compiler Design (40-414) Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007 Evaluation: Midterm Exam 35% Final Exam 35% Assignments and Quizzes 10% Project

More information

SOURCE LANGUAGE DESCRIPTION

SOURCE LANGUAGE DESCRIPTION 1. Simple Integer Language (SIL) SOURCE LANGUAGE DESCRIPTION The language specification given here is informal and gives a lot of flexibility for the designer to write the grammatical specifications to

More information

2.2 Syntax Definition

2.2 Syntax Definition 42 CHAPTER 2. A SIMPLE SYNTAX-DIRECTED TRANSLATOR sequence of "three-address" instructions; a more complete example appears in Fig. 2.2. This form of intermediate code takes its name from instructions

More information

1 Lexical Considerations

1 Lexical Considerations Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2013 Handout Decaf Language Thursday, Feb 7 The project for the course is to write a compiler

More information

Compiler Optimization

Compiler Optimization Compiler Optimization The compiler translates programs written in a high-level language to assembly language code Assembly language code is translated to object code by an assembler Object code modules

More information

UNIT I INTRODUCTION TO COMPILER 1. What is a Complier? A Complier is a program that reads a program written in one language-the source language-and translates it in to an equivalent program in another

More information

UNIT- 3 Introduction to C++

UNIT- 3 Introduction to C++ UNIT- 3 Introduction to C++ C++ Character Sets: Letters A-Z, a-z Digits 0-9 Special Symbols Space + - * / ^ \ ( ) [ ] =!= . $, ; : %! &? _ # = @ White Spaces Blank spaces, horizontal tab, carriage

More information

Chapter 2 A Quick Tour

Chapter 2 A Quick Tour Chapter 2 A Quick Tour 2.1 The Compiler Toolchain A compiler is one component in a toolchain of programs used to create executables from source code. Typically, when you invoke a single command to compile

More information

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program. COMPILER DESIGN 1. What is a compiler? A compiler is a program that reads a program written in one language the source language and translates it into an equivalent program in another language-the target

More information

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input. flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input. More often than not, though, you ll want to use flex to generate a scanner that divides

More information

C Language Programming

C Language Programming Experiment 2 C Language Programming During the infancy years of microprocessor based systems, programs were developed using assemblers and fused into the EPROMs. There used to be no mechanism to find what

More information

Life Cycle of Source Program - Compiler Design

Life Cycle of Source Program - Compiler Design Life Cycle of Source Program - Compiler Design Vishal Trivedi * Gandhinagar Institute of Technology, Gandhinagar, Gujarat, India E-mail: raja.vishaltrivedi@gmail.com Abstract: This Research paper gives

More information

Lexical Considerations

Lexical Considerations Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Fall 2005 Handout 6 Decaf Language Wednesday, September 7 The project for the course is to write a

More information

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1 Defining Program Syntax Chapter Two Modern Programming Languages, 2nd ed. 1 Syntax And Semantics Programming language syntax: how programs look, their form and structure Syntax is defined using a kind

More information

UNIT-4 (COMPILER DESIGN)

UNIT-4 (COMPILER DESIGN) UNIT-4 (COMPILER DESIGN) An important part of any compiler is the construction and maintenance of a dictionary containing names and their associated values, such type of dictionary is called a symbol table.

More information

ASML Language Reference Manual

ASML Language Reference Manual ASML Language Reference Manual Tim Favorite (tuf1) & Frank Smith (fas2114) - Team SoundHammer Columbia University COMS W4115 - Programming Languages & Translators 1. Introduction The purpose of Atomic

More information

CS 6353 Compiler Construction Project Assignments

CS 6353 Compiler Construction Project Assignments CS 6353 Compiler Construction Project Assignments In this project, you need to implement a compiler for a language defined in this handout. The programming language you need to use is C or C++ (and the

More information

COMPILER CONSTRUCTION Seminar 03 TDDB

COMPILER CONSTRUCTION Seminar 03 TDDB COMPILER CONSTRUCTION Seminar 03 TDDB44 2016 Martin Sjölund (martin.sjolund@liu.se) Mahder Gebremedhin (mahder.gebremedhin@liu.se) Department of Computer and Information Science Linköping University LABS

More information

We will study the MIPS assembly language as an exemplar of the concept.

We will study the MIPS assembly language as an exemplar of the concept. MIPS Assembly Language 1 We will study the MIPS assembly language as an exemplar of the concept. MIPS assembly instructions each consist of a single token specifying the command to be carried out, and

More information

UNIT -1 1.1 OVERVIEW OF LANGUAGE PROCESSING SYSTEM 1.2 Preprocessor A preprocessor produce input to compilers. They may perform the following functions. 1. Macro processing: A preprocessor may allow a

More information

Compiler, Assembler, and Linker

Compiler, Assembler, and Linker Compiler, Assembler, and Linker Minsoo Ryu Department of Computer Science and Engineering Hanyang University msryu@hanyang.ac.kr What is a Compilation? Preprocessor Compiler Assembler Linker Loader Contents

More information

LESSON 13: LANGUAGE TRANSLATION

LESSON 13: LANGUAGE TRANSLATION LESSON 13: LANGUAGE TRANSLATION Objective Interpreters and Compilers. Language Translation Phases. Interpreters and Compilers A COMPILER is a program that translates a complete source program into machine

More information

Language Reference Manual simplicity

Language Reference Manual simplicity Language Reference Manual simplicity Course: COMS S4115 Professor: Dr. Stephen Edwards TA: Graham Gobieski Date: July 20, 2016 Group members Rui Gu rg2970 Adam Hadar anh2130 Zachary Moffitt znm2104 Suzanna

More information

Summary: Direct Code Generation

Summary: Direct Code Generation Summary: Direct Code Generation 1 Direct Code Generation Code generation involves the generation of the target representation (object code) from the annotated parse tree (or Abstract Syntactic Tree, AST)

More information

Alternatives for semantic processing

Alternatives for semantic processing Semantic Processing Copyright c 2000 by Antony L. Hosking. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies

More information

Time : 1 Hour Max Marks : 30

Time : 1 Hour Max Marks : 30 Total No. of Questions : 6 P4890 B.E/ Insem.- 74 B.E ( Computer Engg) PRINCIPLES OF MODERN COMPILER DESIGN (2012 Pattern) (Semester I) Time : 1 Hour Max Marks : 30 Q.1 a) Explain need of symbol table with

More information

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done What is a compiler? What is a compiler? Traditionally: Program that analyzes and translates from a high level language (e.g., C++) to low-level assembly language that can be executed by hardware int a,

More information

MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION (Autonomous) (ISO/IEC Certified) WINTER-15 EXAMINATION Model Answer Paper

MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION (Autonomous) (ISO/IEC Certified) WINTER-15 EXAMINATION Model Answer Paper Important Instructions to examiners: 1) The answers should be examined by key words and not as word-to-word as given in themodel answer scheme. 2) The model answer and the answer written by candidate may

More information

ORG ; TWO. Assembly Language Programming

ORG ; TWO. Assembly Language Programming Dec 2 Hex 2 Bin 00000010 ORG ; TWO Assembly Language Programming OBJECTIVES this chapter enables the student to: Explain the difference between Assembly language instructions and pseudo-instructions. Identify

More information

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; } Ex: The difference between Compiler and Interpreter The interpreter actually carries out the computations specified in the source program. In other words, the output of a compiler is a program, whereas

More information

The Structure of a Syntax-Directed Compiler

The Structure of a Syntax-Directed Compiler Source Program (Character Stream) Scanner Tokens Parser Abstract Syntax Tree Type Checker (AST) Decorated AST Translator Intermediate Representation Symbol Tables Optimizer (IR) IR Code Generator Target

More information

COMPILER DESIGN LECTURE NOTES

COMPILER DESIGN LECTURE NOTES COMPILER DESIGN LECTURE NOTES UNIT -1 1.1 OVERVIEW OF LANGUAGE PROCESSING SYSTEM 1.2 Preprocessor A preprocessor produce input to compilers. They may perform the following functions. 1. Macro processing:

More information

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1 Table of Contents About the Authors... iii Introduction... xvii Chapter 1: System Software... 1 1.1 Concept of System Software... 2 Types of Software Programs... 2 Software Programs and the Computing Machine...

More information

SLIDE 2. At the beginning of the lecture, we answer question: On what platform the system will work when discussing this subject?

SLIDE 2. At the beginning of the lecture, we answer question: On what platform the system will work when discussing this subject? SLIDE 2 At the beginning of the lecture, we answer question: On what platform the system will work when discussing this subject? We have two systems: Widnows and Linux. The easiest solution is to use the

More information

Lexical Considerations

Lexical Considerations Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2010 Handout Decaf Language Tuesday, Feb 2 The project for the course is to write a compiler

More information

INSTRUCTION SET AND EXECUTION

INSTRUCTION SET AND EXECUTION SECTION 6 INSTRUCTION SET AND EXECUTION Fetch F1 F2 F3 F3e F4 F5 F6 Decode D1 D2 D3 D3e D4 D5 Execute E1 E2 E3 E3e E4 Instruction Cycle: 1 2 3 4 5 6 7 MOTOROLA INSTRUCTION SET AND EXECUTION 6-1 SECTION

More information

COMPILERS BASIC COMPILER FUNCTIONS

COMPILERS BASIC COMPILER FUNCTIONS COMPILERS BASIC COMPILER FUNCTIONS A compiler accepts a program written in a high level language as input and produces its machine language equivalent as output. For the purpose of compiler construction,

More information

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; } Ex: The difference between Compiler and Interpreter The interpreter actually carries out the computations specified in the source program. In other words, the output of a compiler is a program, whereas

More information

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! Next time reading assignment [ALSU07] Chapters 1,2 [ALSU07] Sections 1.1-1.5 (cover in class) [ALSU07] Section 1.6 (read on your

More information

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction Group B Assignment 8 Att (2) Perm(3) Oral(5) Total(10) Sign Title of Assignment: Code optimization using DAG. 8.1.1 Problem Definition: Code optimization using DAG. 8.1.2 Perquisite: Lex, Yacc, Compiler

More information

CST-402(T): Language Processors

CST-402(T): Language Processors CST-402(T): Language Processors Course Outcomes: On successful completion of the course, students will be able to: 1. Exhibit role of various phases of compilation, with understanding of types of grammars

More information

Contents. Jairo Pava COMS W4115 June 28, 2013 LEARN: Language Reference Manual

Contents. Jairo Pava COMS W4115 June 28, 2013 LEARN: Language Reference Manual Jairo Pava COMS W4115 June 28, 2013 LEARN: Language Reference Manual Contents 1 Introduction...2 2 Lexical Conventions...2 3 Types...3 4 Syntax...3 5 Expressions...4 6 Declarations...8 7 Statements...9

More information

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table COMPILER CONSTRUCTION Lab 2 Symbol table LABS Lab 3 LR parsing and abstract syntax tree construction using ''bison' Lab 4 Semantic analysis (type checking) PHASES OF A COMPILER Source Program Lab 2 Symtab

More information

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square) CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square) Introduction This semester, through a project split into 3 phases, we are going

More information

VARDHAMAN COLLEGE OF ENGINEERING (AUTONOMOUS) Shamshabad, Hyderabad

VARDHAMAN COLLEGE OF ENGINEERING (AUTONOMOUS) Shamshabad, Hyderabad Introduction to MS-DOS Debugger DEBUG In this laboratory, we will use DEBUG program and learn how to: 1. Examine and modify the contents of the 8086 s internal registers, and dedicated parts of the memory

More information

LECTURE NOTES ON COMPILER DESIGN P a g e 2

LECTURE NOTES ON COMPILER DESIGN P a g e 2 LECTURE NOTES ON COMPILER DESIGN P a g e 1 (PCCS4305) COMPILER DESIGN KISHORE KUMAR SAHU SR. LECTURER, DEPARTMENT OF INFORMATION TECHNOLOGY ROLAND INSTITUTE OF TECHNOLOGY, BERHAMPUR LECTURE NOTES ON COMPILER

More information

Spoke. Language Reference Manual* CS4118 PROGRAMMING LANGUAGES AND TRANSLATORS. William Yang Wang, Chia-che Tsai, Zhou Yu, Xin Chen 2010/11/03

Spoke. Language Reference Manual* CS4118 PROGRAMMING LANGUAGES AND TRANSLATORS. William Yang Wang, Chia-che Tsai, Zhou Yu, Xin Chen 2010/11/03 CS4118 PROGRAMMING LANGUAGES AND TRANSLATORS Spoke Language Reference Manual* William Yang Wang, Chia-che Tsai, Zhou Yu, Xin Chen 2010/11/03 (yw2347, ct2459, zy2147, xc2180)@columbia.edu Columbia University,

More information

Jim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 4

Jim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 4 Jim Lambers ENERGY 211 / CME 211 Autumn Quarter 2008-09 Programming Project 4 This project is due at 11:59pm on Friday, October 31. 1 Introduction In this project, you will do the following: 1. Implement

More information

Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known.

Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known. SEMANTIC ANALYSIS: Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known. Parsing only verifies that the program consists of tokens

More information

Preview from Notesale.co.uk Page 6 of 52

Preview from Notesale.co.uk Page 6 of 52 Binary System: The information, which it is stored or manipulated by the computer memory it will be done in binary mode. RAM: This is also called as real memory, physical memory or simply memory. In order

More information

GBIL: Generic Binary Instrumentation Language. Language Reference Manual. By: Andrew Calvano. COMS W4115 Fall 2015 CVN

GBIL: Generic Binary Instrumentation Language. Language Reference Manual. By: Andrew Calvano. COMS W4115 Fall 2015 CVN GBIL: Generic Binary Instrumentation Language Language Reference Manual By: Andrew Calvano COMS W4115 Fall 2015 CVN Table of Contents 1) Introduction 2) Lexical Conventions 1. Tokens 2. Whitespace 3. Comments

More information

The SPL Programming Language Reference Manual

The SPL Programming Language Reference Manual The SPL Programming Language Reference Manual Leonidas Fegaras University of Texas at Arlington Arlington, TX 76019 fegaras@cse.uta.edu February 27, 2018 1 Introduction The SPL language is a Small Programming

More information

Course Outline Introduction to C-Programming

Course Outline Introduction to C-Programming ECE3411 Fall 2015 Lecture 1a. Course Outline Introduction to C-Programming Marten van Dijk, Syed Kamran Haider Department of Electrical & Computer Engineering University of Connecticut Email: {vandijk,

More information

CS201 - Introduction to Programming Glossary By

CS201 - Introduction to Programming Glossary By CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with

More information

Institutionen för systemteknik Department of Electrical Engineering

Institutionen för systemteknik Department of Electrical Engineering Institutionen för systemteknik Department of Electrical Engineering Examensarbete Automatic Parallel Memory Address Generation for Parallel DSP Computing Master thesis performed in Computer Engineering

More information

CS 4201 Compilers 2014/2015 Handout: Lab 1

CS 4201 Compilers 2014/2015 Handout: Lab 1 CS 4201 Compilers 2014/2015 Handout: Lab 1 Lab Content: - What is compiler? - What is compilation? - Features of compiler - Compiler structure - Phases of compiler - Programs related to compilers - Some

More information

C Language Part 1 Digital Computer Concept and Practice Copyright 2012 by Jaejin Lee

C Language Part 1 Digital Computer Concept and Practice Copyright 2012 by Jaejin Lee C Language Part 1 (Minor modifications by the instructor) References C for Python Programmers, by Carl Burch, 2011. http://www.toves.org/books/cpy/ The C Programming Language. 2nd ed., Kernighan, Brian,

More information

QUESTION BANK CHAPTER 1 : OVERVIEW OF SYSTEM SOFTWARE. CHAPTER 2: Overview of Language Processors. CHAPTER 3: Assemblers

QUESTION BANK CHAPTER 1 : OVERVIEW OF SYSTEM SOFTWARE. CHAPTER 2: Overview of Language Processors. CHAPTER 3: Assemblers QUESTION BANK CHAPTER 1 : OVERVIEW OF SYSTEM SOFTWARE 1) Explain Analysis-synthesis model/fron end backend model of compiler 2) Explain various phases of compiler and symbol table. Consider the statement

More information

ADDRESS GENERATION UNIT (AGU)

ADDRESS GENERATION UNIT (AGU) nc. SECTION 4 ADDRESS GENERATION UNIT (AGU) MOTOROLA ADDRESS GENERATION UNIT (AGU) 4-1 nc. SECTION CONTENTS 4.1 INTRODUCTION........................................ 4-3 4.2 ADDRESS REGISTER FILE (Rn)............................

More information

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573 What is a compiler? Xiaokang Qiu Purdue University ECE 573 August 21, 2017 What is a compiler? What is a compiler? Traditionally: Program that analyzes and translates from a high level language (e.g.,

More information

DEMO A Language for Practice Implementation Comp 506, Spring 2018

DEMO A Language for Practice Implementation Comp 506, Spring 2018 DEMO A Language for Practice Implementation Comp 506, Spring 2018 1 Purpose This document describes the Demo programming language. Demo was invented for instructional purposes; it has no real use aside

More information

Project Compiler. CS031 TA Help Session November 28, 2011

Project Compiler. CS031 TA Help Session November 28, 2011 Project Compiler CS031 TA Help Session November 28, 2011 Motivation Generally, it s easier to program in higher-level languages than in assembly. Our goal is to automate the conversion from a higher-level

More information

CS 6353 Compiler Construction Project Assignments

CS 6353 Compiler Construction Project Assignments CS 6353 Compiler Construction Project Assignments In this project, you need to implement a compiler for a language defined in this handout. The programming language you need to use is C or C++ (and the

More information

Features of C. Portable Procedural / Modular Structured Language Statically typed Middle level language

Features of C. Portable Procedural / Modular Structured Language Statically typed Middle level language 1 History C is a general-purpose, high-level language that was originally developed by Dennis M. Ritchie to develop the UNIX operating system at Bell Labs. C was originally first implemented on the DEC

More information

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam Compilers Intermediate representations and code generation Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Today Intermediate representations and code generation Scanner Parser Semantic

More information

CS 360 Programming Languages Interpreters

CS 360 Programming Languages Interpreters CS 360 Programming Languages Interpreters Implementing PLs Most of the course is learning fundamental concepts for using and understanding PLs. Syntax vs. semantics vs. idioms. Powerful constructs like

More information

Compiler Front-End. Compiler Back-End. Specific Examples

Compiler Front-End. Compiler Back-End. Specific Examples Compiler design Overview Compiler Front-End What is a compiler? Lexical Analysis Syntax Analysis Parsing Compiler Back-End Code Generation Register Allocation Optimization Specific Examples lex yacc lcc

More information

Short Notes of CS201

Short Notes of CS201 #includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system

More information

Pioneering Compiler Design

Pioneering Compiler Design Pioneering Compiler Design NikhitaUpreti;Divya Bali&Aabha Sharma CSE,Dronacharya College of Engineering, Gurgaon, Haryana, India nikhita.upreti@gmail.comdivyabali16@gmail.com aabha6@gmail.com Abstract

More information

CS401 - Computer Architecture and Assembly Language Programming Glossary By

CS401 - Computer Architecture and Assembly Language Programming Glossary By CS401 - Computer Architecture and Assembly Language Programming Glossary By absolute address : A virtual (not physical) address within the process address space that is computed as an absolute number.

More information

4. An interpreter is a program that

4. An interpreter is a program that 1. In an aboslute loading scheme, which loader function is accomplished by programmer? A. Allocation B. LInking C. Reallocation D. both (A) and (B) 2. A compiler program written in a high level language

More information

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program.

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program. Language Translation Compilation vs. interpretation Compilation diagram Step 1: compile program compiler Compiled program Step 2: run input Compiled program output Language Translation compilation is translation

More information

Chapter 3:: Names, Scopes, and Bindings (cont.)

Chapter 3:: Names, Scopes, and Bindings (cont.) Chapter 3:: Names, Scopes, and Bindings (cont.) Programming Language Pragmatics Michael L. Scott Review What is a regular expression? What is a context-free grammar? What is BNF? What is a derivation?

More information

Compiler Construction Assignment 3 Spring 2018

Compiler Construction Assignment 3 Spring 2018 Compiler Construction Assignment 3 Spring 2018 Robert van Engelen µc for the JVM µc (micro-c) is a small C-inspired programming language. In this assignment we will implement a compiler in C++ for µc.

More information

Chapter 2A Instructions: Language of the Computer

Chapter 2A Instructions: Language of the Computer Chapter 2A Instructions: Language of the Computer Copyright 2009 Elsevier, Inc. All rights reserved. Instruction Set The repertoire of instructions of a computer Different computers have different instruction

More information

When do We Run a Compiler?

When do We Run a Compiler? When do We Run a Compiler? Prior to execution This is standard. We compile a program once, then use it repeatedly. At the start of each execution We can incorporate values known at the start of the run

More information

What do Compilers Produce?

What do Compilers Produce? What do Compilers Produce? Pure Machine Code Compilers may generate code for a particular machine, not assuming any operating system or library routines. This is pure code because it includes nothing beyond

More information

CPS 506 Comparative Programming Languages. Syntax Specification

CPS 506 Comparative Programming Languages. Syntax Specification CPS 506 Comparative Programming Languages Syntax Specification Compiling Process Steps Program Lexical Analysis Convert characters into a stream of tokens Lexical Analysis Syntactic Analysis Send tokens

More information

CS606- compiler instruction Solved MCQS From Midterm Papers

CS606- compiler instruction Solved MCQS From Midterm Papers CS606- compiler instruction Solved MCQS From Midterm Papers March 06,2014 MC100401285 Moaaz.pk@gmail.com Mc100401285@gmail.com PSMD01 Final Term MCQ s and Quizzes CS606- compiler instruction If X is a

More information

ECE Digital System Design & Synthesis Exercise 1 - Logic Values, Data Types & Operators - With Answers

ECE Digital System Design & Synthesis Exercise 1 - Logic Values, Data Types & Operators - With Answers ECE 601 - Digital System Design & Synthesis Exercise 1 - Logic Values, Data Types & Operators - With Answers Fall 2001 Final Version (Important changes from original posted Exercise 1 shown in color) Variables

More information

Compilers and Interpreters

Compilers and Interpreters Overview Roadmap Language Translators: Interpreters & Compilers Context of a compiler Phases of a compiler Compiler Construction tools Terminology How related to other CS Goals of a good compiler 1 Compilers

More information

Program Analysis ( 软件源代码分析技术 ) ZHENG LI ( 李征 )

Program Analysis ( 软件源代码分析技术 ) ZHENG LI ( 李征 ) Program Analysis ( 软件源代码分析技术 ) ZHENG LI ( 李征 ) lizheng@mail.buct.edu.cn Lexical and Syntax Analysis Topic Covered Today Compilation Lexical Analysis Semantic Analysis Compilation Translating from high-level

More information

Formats of Translated Programs

Formats of Translated Programs Formats of Translated Programs Compilers differ in the format of the target code they generate. Target formats may be categorized as assembly language, relocatable binary, or memory-image. Assembly Language

More information