UNIT II ASSEMBLERS. Figure Assembler

2.1 Basic assembler functions UNIT II ASSEMBLERS Assembler Assembler which converts assembly language programs into object files. Object files contain a combination of machine instructions, data, and information needed to place instructions properly in memory. Figure 2.1 - Assembler Assembler functions Convert mnemonic operation codes to their machine language equivalents Convert symbolic operands to their equivalent machine addresses. Build the machine instructions in the proper format. Convert the data constants to internal machine representations Write the object program and the assembly listing Error checking is provided Changes can be quickly and easily incorporated with a reassembly Features of assemblers Mnemonic operation codes Symbolic operations Data declarations Assembly Language Statements There are three types of statements are used in the assembly language. They are: 1. Imperative Statements 2. Declaration Statements 3. Assembler Directives Prepared by P.Vasantha kumari 14

1. Imperative statements It indicates an action to be performed during the execution of the assembled program. It focuses on how to solve a problem based on side effects on memory. Each imperative statement translates into one machine instruction. 2. Declaration Statements It focuses on what the problem is and leave the solution mechanism up to the language implementation. It is quite abstract and harder to implement efficiently. 3. Assembler Directives It instructs the assembler to perform certain actions during the assembly of a program. They can be used to declare variables, create storage space and declare constants. Some of the assembler directives are: START Specify name and staring address for the program END indicates the end of the source program and (optionally) specifies the first executable instruction in the program BYTE Generate character or hexadecimal constant, occupying as many bytes as needed to represent the constant WORD Generate one-word integer constant Statement Format All the statements in the assembly language program is in the form of [LABEL] <OPCODE> <OPERAND> COMMENTS Label: It is an identifier and an optional field. It remembers where the data or code is located. The maximum length of label which differs depends upon the assembler. Most of the assembler that supports 32 characters in length. It is suffixed by a colon (:) and begins with [A Z]. Example: START: LDA #24 OPCODE: It contains mnemonic. OPCODE stands for operation code or machine code instruction. It also requires operands. OPERAND: It specifies constants, labels or immediate data. Data contained in another accumulator or register and address. Prepared by P.Vasantha kumari 15

Advantages of assembler It reduced errors Faster translation times Changes could be made easier and faster Disadvantages of assembler Many instructions are required to achieve small tasks. Source Programs tend to be large and difficult to follow Programmers requires knowledge of the processor architecture and instruction set Programs are machine independent requires complete rewrites if the hardware is changed 2.2 A simple SIC assembler Assembler Function A simple SIC (Simplified Instructional Computer) assembler which performs following functions such as: Convert mnemonic operation codes to their machine language equivalents Convert symbolic operands to their equivalent machine addresses. Build the machine instructions in the proper format. Convert the data constants to internal machine representations Write the object program and the assembly listing Assembler directives The SIC assembler language has the following assembler directives. START Specify name and staring address for the program END Indicate the end of the source program and (optionally) specify the first executable instruction in the program BYTE Generate character or hexadecimal constant, occupying as many bytes as needed to represent the constant WORD Generate one-word integer constant Prepared by P.Vasantha kumari 16

RESB Reserve the indicated number of bytes for a data area RESW Reserve the indicated number of words for a data area Assemblers Assembler which converts the assembly language into machine cod or object code. There are two types of assemblers are there: o Two pass assembler Pass 1 assembler Pass 2 assembler o One pass assembler In two pass assembler, the first pass scans the source program for label definitions and assigns addresses whereas the second performs most of the actual translation. The assembler must process the assembler directives statements. These statements are not translated into machine instructions. They provide instructions to the assembler itself. The assembler directives SATRT specifies the starting address of the object program and END marks the end of the programs. Assembler must write object code onto some output device. This object program will later be loaded into memory for execution. An object program contains three types of records: Header Text End Header record contains the program name, starting address and length. Text record contains the translated instructions and data of the program, together with an indication of the addresses where these are to be loaded. End record marks the end of the object program and specifies the address in the program where execution is to begin. Prepared by P.Vasantha kumari 17

Functions of Assemblers Pass 1(define symbols) Assign addresses to all statements in the program. Save the values (addresses) assigned to all labels for use in Pass 2. Perform some processing of assembler directives. Include processing that affects address assignment such as determining the length of data areas defined by BYTE, RESW, etc. Pass 2 (assemble instructions and generate object program) Assemble instructions which translate operation codes and looking up addresses Generate data values defined by BYTE, WORD, etc. Perform processing of assembler directives not done during Pass 1. Write the object program and the assembly listing. Format of Object Program Header Record Col. 1 H Col. 2-7 Program Name Col. 8-13 Starting address of object program (hexadecimal) Col. 14-19 Length of object program in bytes (hexadecimal) Text Record Col. 1 Col. 2-7 Col. 8-9 Col. 10-69 T Starting address for object code in this record (hexadecimal) Length of object code in this record in bytes (hexadecimal) Object End Record Col. 1 Col. 2-7 E Address to first executable instruction in object program (hexadecimal) Prepared by P.Vasantha kumari 18

Example (i) Assembly Language Program with object Code LOOCTR LABEL OPCODE OPERAND OBJ.CODE MAIN START 2000 2000 BEGIN LDA NUM1 00200C 2003 STA NUM2 0C200F 2006 LDCH CHAR1 502012 2009 STCH CHAR2 542015 200C NUM1 WORD 5 000005 200F NUM2 RESW 1 2012 CHAR1 BYTE C A 000041 2013 CHAR2 RESB 1 2014 END BEGIN (ii) Object Program H^MAIN^00 2000^00 0014 T^00 0000^0F^00 200C^0C 200F^50 2012^54 2015^00 0005 T^00 2012^01^00 0041 E^00 2000 2.3 Assembler algorithm and Data structures Internal data structures Operation Code Table (OPTAB) Symbol Table (SYMTAB) Location Counter (LOCCTR) OPTAB OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents. Prepared by P.Vasantha kumari 19

In most cases, OPTAB is a static table. OPTAB must contain the mnemonic operation code and its machine language equivalent. In more complex assemblers, OPTAB also contains information about instruction format and length. OPTAB is usually organized as a hash table, with mnemonic operation code as the key. SYMTAB SYMTAB is used to store values (addresses) assigned to labels. SYMTAB includes the name and value (address) for each label in the source program together with flags to indicate error conditions. e.g., a symbol defined in two different places This table may also contain information, such as type or length, about the data area or instruction labeled. SYMTAB is usually organized as a hash table for efficiency of insertion and retrieval. the label is the key of SYMTAB. non-random key LOCCTR This is a variable that is used to help in the assignment of address. LOCCTR is initialized to the beginning address specified in the START statement. After each source statement is processed, the length of the assembled instruction or data area to be generated is added to LOCCRT. When a label is reached, the current value of LOCCTR gives the address to be associated with that label. 2.3.1 Two Pass Assemblers Two pass assembler that translates the assembly language program itno object code or machine in two passes i.e., pass 1 and pass 2. The pass 1 algorithm which scans the source program for label definitions and assigns addresses whereas pass 2 algorithm performs the actual translation. Prepared by P.Vasantha kumari 20

Figure 2.2 Two Pass Assembler Pass 1 of a two pass assembler Step 1: Read the input line. Step 2: Check to see if the opcode field in the input line is START. (i) Find if there is any operand field after START; initialize the LOCCTR to the operand value. (ii) Otherwise, if there is no value in the operand field the LOCCTR is set to zero. Step 3: Write the line to the intermediate file. Step 4: Repeat the following for the other lines in the program until the opcode field contains END directive. 1. If there is a symbol in the label field. i. Check the symbol table to see if has already been stored over there. If so then it is a duplicate symbol, the error message should be displayed. ii. Other wise the symbol is entered into the SYMTAB, along with the memory address in which it is stored. 2. If there is an opcode in the opcode field i. Search the OPTAB to see if the opcode is present, if so increment the location counter (LOCCTR) by three. ii. a) If the opcode is WORD, increment the LOCCTR by three. b) If the opcode is BYTE, increment the LOCTR by one. c) If the opcode is RESW, increment the LOCCTR by integer equivalent of the operand value *3. d) If the opcode is RESB, increment the LOCCTR by the integer equivalent of the operand value. 3. Write each and every line processed to the intermediate file along with their Prepared by P.Vasantha kumari 21

location counters. Step 5: Calculate the length of the program by subtracting the starting address of the program from the final value of the LOCCTR Algorithm Read first input line if OPCODE= START then Save #[OPERAND] a starting address Initialize LOCCTR to starting address Write line to intermediate file Read next input line End (if START) Else Initialize LOCCTR to 0 While OPCODE END do if this is not a comment line then if there is a symbol in the LABEL filed then Search SYMTAB for LABEL if found then Set error flag (duplicate symbol) else Insert (LABEL, LOCCTR) into SYMTAB End {if symbol} Search OPTAB for OPCODE if found then Prepared by P.Vasantha kumari 22

Add 3 {instruction length} to LOCCTR else if OPCODE= WORD then Add 3* # [OPERAND] to LOCCTR else if OPCODE= RESW then Add 3* # [OPERAND] to LOOCTR else if OPCODE= RESB then Add # [OPERAND] to LOOCTR else if OPCODE= BYTE then Find length of constant in bytes Add length to LOOCTR End {if BYTE} Else Set error flag (invalid operation code) End {if not c comment} Write line to intermediate file Read next input line End {while not END} Write last line to intermediate file Save (LOCCTR-starting address) as program length End {pass 1} Example Input - Assembly Language Program MAIN START 2000 BEGIN LDA NUM1 STA NUM2 LDCH CHAR1 STCH CHAR2 NUM1 WORD 5 NUM2 RESW 1 Prepared by P.Vasantha kumari 23

CHAR1 BYTE C A CHAR2 RESB 1 END BEGIN Output Assign addresses to instruction MAIN START 2000 2000 BEGIN LDA NUM1 2003 ** STA NUM2 2006 ** LDCH CHAR1 2009 ** STCH CHAR2 200C NUM1 WORD 5 200F NUM2 RESW 1 2012 CHAR1 BYTE C A 2013 CHAR2 RESB 1 2014 END BEGIN Output Symbol table BEGIN 2000 NUM1 200C NUM2 200F CHAR1 2012 CHAR2 2013 Pass 2 of a two pass assembler Step 1: Read the first line from the intermediate file. Step 2: Check to see if the opcode field in the input line is START, if so then write the line onto the final output file. Step 3: Repeat the following for the other lines in the intermediate file until the opcode field contains END directive. 1. If there is a symbol in the operand field, then the object code is assembled by combining the machine code equivalent of the instruction with the symbol address. 2. If there is no symbol in the operand field, then the operand address is assigned as zero and it is assembled with the machine code equivalent of the instruction. Prepared by P.Vasantha kumari 24

3. If the opcode field is BYTE or WORD or RESB, then convert the constants in the operand filed to the object code. 4. Write the input line along with the object code onto the final output file. Step 4: Close all the opened files and exit. Algorithm Read first input line {from intermediate file} if OPCODE= START then Write listing line Read next input line End {if START} Write Header Record to Object Program Initialize first Text Record While OPCODE END do if this is not a comment line then Search OPTAB for OPCODE if found then if there is a symbol in OPERAND filed then Search SYMTAB for OPERAND if found then Store symbol value as operand address else Store 0 as operand address Set error flag (undefined symbol) Prepared by P.Vasantha kumari 25

End End {if symbol} else Store 0 as operand address Assemble the object code instruction end {if opcode found} else if OPCODE= BYTE or WORD then convert constant to object code if object code will not fit into the current text Record then begin write Text record to object Program initialize new text Record end add object code to Text Record end {if not comment} write listing line read next input line end {while not END} write last Text Record to object program Write End Record to object program Write last listing line End {pas 2} Example Input Assembly Language Program with address MAIN START 2000 2000 BEGIN LDA NUM1 2003 ** STA NUM2 2006 ** LDCH CHAR1 2009 ** STCH CHAR2 200C NUM1 WORD 5 Prepared by P.Vasantha kumari 26

200F NUM2 RESW 1 2012 CHAR1 BYTE C A 2013 CHAR2 RESB 1 2014 END BEGIN Input Symbol table BEGIN 2000 NUM1 200C NUM2 200F CHAR1 2012 CHAR2 2013 Output Object Code MAIN START 2000 2000 BEGIN LDA NUM1 00200C 2003 STA NUM2 0C200F 2006 LDCH CHAR1 502012 2009 STCH CHAR2 542015 200C NUM1 WORD 5 000005 200F NUM2 RESW 1 2012 CHAR1 BYTE C A 000041 2013 CHAR2 RESB 1 2014 END BEGIN 2.4 Machine Dependent Assembler Features Register-to-register instructions are shorter and do not require another memory reference. Use register-to-register instructions instead of register-to-memory instructions whenever possible. Most of register-to-memory instructions are assembled using either programcounter relative addressing or base relative addressing Prepared by P.Vasantha kumari 27

If the required displacement is too large, then the 4-byte extended instruction format must be used. The programmer must specify the 4-byte format by adding the prefix + to the operation code in the source statement. If the required displacement is out of range, the assembler then attempts to use base relative addressing. If neither form of relative addressing is applicable, then the instruction cannot be properly assembled and the assembler must generate an error message. The assembler directive BASE is used in conjunction with base relative addressing. Indirect addressing is indicated by adding the prefix @ to the operand. Immediate addressing is specified with the prefix # to the immediate operands. There are two important assembler features are: Instruction Format and Addressing Modes Program Relocation (i) Instruction Format and Addressing Modes Instruction Format: The programmer must specify the 4-byte format by adding the prefix + to the operation code in the source statement. If the required displacement is too large, then the 4-byte extended instruction format must be used. If extended format is not specified, the assembler may first attempt to translate the instruction using programcounter relative addressing. Addressing Modes: Most of register-to-memory instructions are assembled using either program-counter relative addressing or base relative addressing. Immediate addressing is specified with the prefix # to the immediate operands. If neither program counter relative nor base relative addressing can be used, then 4 byte extended instruction format must be used which consists of 20 bit address filed. (a) Program counter relative addressing Consider the assembly language, LINE LOCCTR LABEL OPCODE OPERANDS 10 0000 FIRST STL RETADR 12 0003 LDB #LENGTH Prepared by P.Vasantha kumari 28

13 BASE LENGTH 15 0006 CLOOP +JSUB RDREC 40 0017 J CLOOP 95 0030 RETADR RESW 1 100 0033 LENGTH RESW 1 125 1036 RDREC CLEAR X 133 103C +LDT #4096 160 104E STCH BUFFER, X Example1: Consider the statement in Line 10. During the execution of instructions on SIC, the program counter is advanced after the instruction is fetched and before it is executed. RETADR is assigned the address 0030. Now, calculate the displacement value. For program counter addressing, TA= (PC) + Disp Target address (TA)=RETADR=030 and (PC)=003 (PC) = address of the next instruction of line number 10. Now displacement = TA-(PC) = 030-003 TA= 030= 0000 0011 0000 (PC)= 003= 0000 0000 0011 (subtract) Disp= 0000 0010 1101 = 02D The instruction format for this instruction using program counter addressing is shown below. 6 1 1 1 1 1 1 12 0001 01 1 1 0 0 1 0 0000 0010 1101 Opcode n i x b p e Disp 1 7 2 0 2 D Example2: Consider the statement in line 40, Jump to the label CLOOP which is already defined in the address 0006. For program counter addressing, TA= (PC) + Disp Target address (TA)=CLOOP=006 and (PC)=01A (PC) = address of the next instruction of line number 40. Prepared by P.Vasantha kumari 29

Now displacement = TA-(PC) = 006-01A TA= 006= 0000 0000 0110 (PC)= 01A= 0000 0001 1010 (subtract) Disp= 1111 1110 1100 = FEC The instruction format for this instruction using program counter addressing is shown below. 6 1 1 1 1 1 1 12 0011 11 1 1 0 0 1 0 1111 1110 1100 Opcode n i x b p e Disp 1 7 F E C (b) Base Relative addressing Consider the statement in line 160 it stores the value of base register to BUFFER. The base register which takes the value of LENGTH stored in the address 0033. For this instruction, the disp will be calculated using base relative addressing: TA= (B) + Disp Target address (TA)=BUFFER=036 and (B)=033 Now displacement = TA-(B) = 036-033=003 (c) Immediate Addressing Consider the statement in line 12, which stores the value of LENGTH to accumulator. In immediate addressing, this immediate value is assigned to the displacement field. If the value is fit into 12 bits then use format 3 type instruction. Otherwise format 4 instruction type is used. (ii) Program Relocation An object program that contains the information necessary to perform the modification is called a re-locatable program. The assembler can identify for the loader those parts of the object program that need modification. The memory address of operands should be modified according to the loaded address, while constant data should remain unchanged. In order to avoid the re- locatable problem, we use modification record. Program relocation is needed because of the following reasons: Prepared by P.Vasantha kumari 30

It is desirable to load and run several problems at the same time. The system must be able to load programs into memory wherever there is a room. The exact starting address of the program is not known until load time. The modification record The assembler produces a modification record describing the address and length of an address field to be modified. The loader will add the beginning address of the loaded program to the address field specified by a modification record. Col. 1 Col. 2-7 Col. 8-9 M Starting location of the address field to be modified, relative to the beginning of the program Length of the address field to be modified, in halfbytes Example Figure 2.3 Program Relocation Prepared by P.Vasantha kumari 31

2.5 Machine Independent Assembler Features (i) Literals Literals Symbol-defining statements Expressions Program blocks It is convenient for the programmer to be able to write the value of a constant operand as a part of the instruction that uses it. Such an operand is called a literal 45 001A ENDFIL LDA =C EOF 003210 In this assembler language notation, a literal is identified with the prefix=, which is followed by a specification of the literal value. The difference between a literal and an immediate operand With immediate addressing, the operand value is assembled as a part of the machine instruction. 55 0020 LDA #3 010003 With a literal, the assembler generates the specified value as a constant at some other memory location. The address of this generated constant is used as the target address for the machine instruction. 45 001A ENDFIL LDA=C EOF 032010 Literal pool All of the literal operands used in the program are gathered together into one or more literal pools. Normally literals are placed into a pool at the end of the program. Sometimes, it is desirable to place literals into a pool at some other location in the object program. LTORG directive is introduced for this purpose. When the assembler encounters a LTORG, it creates a pool that contains all of the literals used since the previous LTORG. Literal for current value of location counter The value of the location counter can be denoted by a literal operand *. Prepared by P.Vasantha kumari 32

BASE * LDB =* The literal =* repeatedly used in the program that have identical names but different values, and both must be in the literal pool. Handling duplicate literal The assembler should avoid storing duplicate literals. The easiest way to recognize duplicate literals is by comparison of the character strings defining them. For example, 215 1062 WLOOP TD =X 05 230 106B WD =X 05 In this case, literal 05 is repeatedly used. In order to avoid the duplication enter the details of literal into the literal able LITTAB only once. Literal table (LITTAB) The basic data structure needed to process literal operands is a literal table (LITTAB). LITTAB is often organized as a hash table, using the literal name or value as the key. which consists of literal name, hexadecimal value, address and value fields. Example Literals Hexadecimal Length Address Value C EOF 454F46 3 002D X 05 05 1 1076 Implementation of Literals Pass 1 For each recognized literal operand, search LITTAB. If the literal is already present in the table, no action is need; if it is not present, the literal is added to LITTAB without assigning its address. When a LTORG statement is encountered or the end of the program, the assembler makes a scan of LITTAB and assigns an address to each literal. Update the location counter to reflect the number of bytes occupied by each literal. Prepared by P.Vasantha kumari 33

Pass 2 Search LITTAB for each literal operand encountered. The data values specified by the literals in each literal pool are inserted at the appropriate places in the object program. In the same way as these values generated by BYTE or WORD statements. If a literal value represents an address in the program, the assembler must generate the appropriate Modification record. (ii) Symbol-Defining Statements EQU directives Assemblers provide an assembler directive EQU that allows the programmer to define the symbol and specify their values. The general syntax to use the EQU is: symbol EQU value When the assembler encounters the EQU statement, it enters symbol into SYMTAB with the value of symbol. Use of EQU (i) Establish symbolic names that can be used for improved readability in place of numeric. values. +LDT #4096 MAXLEN EQU 4096 +LDT #MAXLEN (ii) Define mnemonic names for registers. A EQU 0 X EQU 1 L EQU 2 (iii) Establish and use names that reflect the logical function of the registers in the program. BASE EQU R1 COUNT EQU R2 INDEX EQU R3 Prepared by P.Vasantha kumari 34

ORG directives The assembler directive ORG is usually used to indirectly assign values to symbols. The general syntax to use the ORG directive is: ORG value value is a constant or an expression involving constants and previously defined symbols. When this statement is encountered, the assembler resets its location counter (LOCCTR) to the specified value. Since the values of symbols are taken from LOOCTR, the ORG statement will affect the values of all labels defined until the next ORG. Use ORG for label definition Suppose that we want to define a table with the following structure. SYMBOL field - 6 bytes VALUE field - 3 bytes or 1 word FLAG field - 2 bytes SYMBOL field contains user defined symbols. VALUE field defines the value assigned to the symbol and FALG field specifies the symbol type and other information. STAB (100 Entries) SYMBOL VALUE FLAGS To reserve the space for symbol table, we can write STAB RESB 1100 Totally, 1100 bytes are reserved for 100 entries. The EQU directive is used to define the labels for the symbol table such as SYMBOL, VALUE and FLAG using the following statements: STAB RESB 1100 SYMBOL EQU STAB VALUE EQU STAB+6 FLAGS EQU STAB+9 Prepared by P.Vasantha kumari 35

We can fetch the VALUE field from the table entry indicated by the contents of register X using LDA VALUE, X The same symbol definition using ORG is as follows: STAB RESB 1100 ORG STAB SYMBOL RESB 6 VALUE RESW 1 FLAGS RESB 2 ORG STAB+1100 The first ORG resets the LOOCTR to the value of STAB and the last ORG set the LLOCCR back to its previous value. Restrictions of EQU and ORG in an ordinary two-pass assembler For an ordinary two-pass assembler, all symbols must be defined during Pass 1. Hence, the following sequences could not be processed by an ordinary two-pass assembler. All terms used to specify the value of the new symbol must have been defined previously in the program. Example1: ALPHA EQU BETA BETA EQU DELTA DELTA RESW 1 (not valid) Example2: ORG ALPHA BYTE1 RESB 1 BYTE2 RESB 1 BYTE3 RESB 1 ORG ALPHA RESB 1 (not valid) Prepared by P.Vasantha kumari 36

Example3: BETA EQU ALPHA ALPHA RESW 1 (not valid) Example4: ALPHA RESW 1 BETA EQU ALPHA (valid) (iii) Expressions Most assemblers allow the use of expressions whenever a single operand such as a label or literal is permitted. Each such expression must be evaluated by the assembler to produce a single operand address or value. Assemblers generally allow arithmetic expressions formed according to the normal rule using the operators +,-,*, and /. Individual terms in the expression may be constants, user-defined symbols, or special terms. The most common special term is the current value of the location counter (often designated by *) Types of terms Absolute terms -> The value of an absolute term is independent of program location. Relative terms -> The value of a relative term is dependent on the beginning address of the program. Types of expressions By the type of value produced, expressions can classified as Absolute expressions: The value of an absolute expression is independent of the program location. The absolute expression may contains relative terms provided the relative terms occur in pairs and the terms in each such pair have opposite signs. No relative term can enter multiplication or division operation. e.g. MAXLEN EQU BUFEND-BUFFER Relative expressions: The value of a relative expression is relative the beginning address of the object program. Expressions that are neither relative nor absolute should be flagged by the assembler as errors. Relative expressions can be written as S+r where S is the starting address of the program r is the relative term related to the beginning of the program. Prepared by P.Vasantha kumari 37

(iv) Program Blocks It refers to segments of code that are rearranged within a single object program unit. The assembler directive USE is used to define the block for the program statements. The general syntax for USE directive is: USE [block name] Three blocks are used in the assembly language program. They are: o Unnamed Program block (default block) o CDATA block o CBLKS block At the beginning, statements are assumed to be part of the unnamed (default) block. If no USE statements are included, the entire program belongs to this single block. Each program block may actually contain several separate segments of the source program. Implementation of Program Blocks Pass 1 Each program block has a separate location counter Each label is assigned an address that is relative to the start of the block that contains it At the end of Pass 1, the latest value of the location counter for each block indicates the length of that block The assembler can then assign to each block a starting address in the object program Pass 2 The address of each symbol can be computed by adding the assigned block starting address and the relative address of the symbol to that block 2.6 One Pass Assembler The one-pass assembler is used if it is necessary and desirable to avoid a second pass over program. A one-pass assembler scans the program just once. The main problem in trying to assemble a program in one pass involves forward references. All storage reservation statements can be defined before they are referenced. But, forward references to labels on Prepared by P.Vasantha kumari 38

instructions cannot be eliminated as easily. The logic of the program often needs a forward jump. The one-pass assembler must make some special provision for handling forward references. Two types of one-pass assemblers 1. One type of one-pass assemblers produces object code directly in memory for immediate execution. No object program is written out. No loader is needed. 2. The other type of one-pass assemblers produces the usual kind of object program for later execution. Load-and-go assembler The assembler that does not write object program out and does not need a loader is called a Load -and-go assembler. It avoids the overhead of writing the object program out and reading it back in. It is useful in a system that is oriented toward program development and testing. A load-and-go assembler can be a one-pass assembler or a two-pass assembler. Handling of forward references in one-pass load-and-go assembler The assembler generates object code instructions as it scans the source program. If an instruction operand is a symbol that has not yet been defined, the symbol is entered into the symbol table with a flag indicating that the symbol is undefined; the operand address is omitted when the instruction is assembled; the operand address is added to a list of forward references associated with the symbol table entry. When the definition for a symbol is encountered, the forward reference list for that symbol is scanned, and the proper address is inserted into any instructions previously generated. Algorithm Read first input line if OPCODE= START then Prepared by P.Vasantha kumari 39

Save #[OPERAND] a starting address Initialize LOCCTR to starting address Write line to intermediate file Read next input line End (if START) Else Initialize LOCCTR to 0 While OPCODE END do if this is not a comment line then if there is a symbol in the LABEL filed then Search SYMTAB for LABEL if found then if <symbol value> as NULL set <symbol value> as LOCCTR and search the linked list with corresponding operand PTR addresses and generate operand addresses as corresponding symbol values set symbol values as LOOCTR in symbol table and delete linked list End else Insert (LABEL, LOCCTR) into SYMTAB End {if symbol} Search OPTAB for OPCODE if found then Prepared by P.Vasantha kumari 40

Search SYMTAB for OPERAND addresses If found then If symbol value not equal to NULL then Store symbol value as OPERAND address Else Insert at the end of the linked list with a node with address as LOCCTR Else Insert (symbol name, NULL) LOCCTR+=3 End else if OPCODE= WORD then Add 3 to LOCCTR and convert comment to object code else if OPCODE= RESW then Add 3* # [OPERAND] to LOOCTR else if OPCODE= RESB then Add # [OPERAND] to LOOCTR else if OPCODE= BYTE then Find length of constant in bytes Add length to LOOCTR Convert constant to object code End If object code will not fit into current Text Record then Write Text Record to object program initialize new Text Record End Add object code to Text Record End Write listing line Prepared by P.Vasantha kumari 41

Read next input line End Write last Text Record to object program Write End Record to object program Write last listing line End {one pass} Explanation Step 1: Read the input line. Step 2: Check to see if the opcode field in the input line is START. 1. Find if there is any operand field after START; initialize the LOCCTR to the operand value. 2. Otherwise if there is no value in the operand field the LOCCTR is set to zero. Step 3: Write the line onto the output file. Step 4: Repeat the following for the other lines in the input file until the opcode field contains END directive. 1. If there is a symbol in the label field. i. Check the symbol table to see if has already been stored and if it is marked as undefined entry. If so then update the symbol table with the proper address and mark it as defined entry. ii. Other wise the symbol is entered into the symbol table along with the memory address in which it is stored. 2. If there is an opcode in the opcode field i. Search the OPTAB to see if the opcode is present, if so increment the location counter (LOCCTR) by three. ii. a) If the opcode is WORD, increment the LOCCTR by three and convert the constants in the operand field to the object code. b) If the opcode is BYTE, increment the LOCTR by one and convert the constants in the operand field to the object code. c) If the opcode is RESW, increment the LOCCTR by integer equivalent of the operand value *3 and convert the constants in the operand field to the object Prepared by P.Vasantha kumari 42

code. d) If the opcode is RESB, increment the LOCCTR by the integer equivalent of the operand value and convert the constants in the operand field to the object code. 3. If there is a symbol in the operand field. i. Check the symbol table to see if has already been stored. If so, then assemble the object code by combining the machine code equivalent of the instruction with the symbol address. ii. Other wise the symbol is entered into the symbol table and it is marked as undefined entry. 4. If there is no symbol in the operand field, then operand address is assigned as zero, and it is assembled with the machine code equivalent of the instruction. 5. Write the input line along with the object code onto output file. Step 5: Close all the opened files and exit. Example Source Program with object code Prepared by P.Vasantha kumari 43

Memory Address Contents 1000 454F4600 00030000 00xxxxxx xxxxxxxx 1010 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx.. 2000 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxx14 2010 100948-- --00100C 28100630 ----48-- 2020 --3C2012. Symbol Table 2.7 Multi Pass Assembler Multi pass assembler which translate the assembly language program into the machine code or object code in multiple passes. It is used to eliminate forward references in symbol definition. It creates a number of passes that is necessary to process the definition of symbols. It is unnecessary for a multi-pass assembler to make more than two Prepared by P.Vasantha kumari 44

passes over the entire program. Instead, only the parts of the program involving forward references need to be processed in multiple passes. The method presented here can be used to process any kind of forward references. If we use a two-pass assembler, the following symbol definition cannot be allowed. ALPHA EQU BETA BETA EQU DELTA DELTA RESW 1 This is because ALPHA and BETA cannot be defined in pass 1. Actually, if we allow multipass processing, DELTA is defined in pass 1, BETA is defined in pass 2, and ALPHA is defined in pass 3, and the above definitions can be allowed. This is the motivation for using a multi-pass assembler. Multi pass assembler uses a symbol table to store symbols that are not totally defined yet. For a undefined symbol, in its entry, We store the names and the number of undefined symbols which contribute to the calculation of its value. We also keep a list of symbols whose values depend on the defined value of this symbol. When a symbol becomes defined, we use its value to reevaluate the values of all of the symbols that are kept in this list. The above step is performed recursively. Example The following symbols defining statements which involves forward references. HALFSZ EQU MAXLEN/2 MAXLEN EQU BUFEND-BUFFER PREVBT EQU BUFFER-1 BUFFER RESB 4096 BUFEND EQU * These statements would not assign the address for labels within two passes. The following figure displays the symbol table entries resulting from pass 1 processing the first statement. HALFSZ EQU MAXLEN/2 Prepared by P.Vasantha kumari 45

MAXLEN has not yet been defined so the value for HALFSZ is not computed. Expression for HALFSZ is stored in the symbol table in place of its value. The entry and 1 indicates that one symbol in the defining expression is undefined. For the next statement, MAXLEN EQU BUFEND-BUFFER There are two undefined symbols involved in the definitions. They are BUFFEND and BUFFER. Both of these are entered into SYMTB with lists indicates the dependence of MAXLEN upon them. The definition of the second statement is shown in the following figure. The next figure shows the defining symbol details of the third statement PREVBT EQU BUFFER-1 Prepared by P.Vasantha kumari 46

In this case, a new undefined symbol PREVBT is added to the symbol table and it is also defined from the symbol BUFFER. So, it can be added to the list. The next figure shows that when BUUFEND is defined, MAXLEN and HALFSZ The following figure shows the symbol definitions for the statement BUFFER RESB 4096 In this case, BUFFER symbol is defined. From this symbol definition, PREVBT can be defined accordingly. The next figure shows the complete symbol table entries after defining the statement BUFEND EQU * In this case, BUFFEND symbol is defined. The current value of LOCCTR will be assigned to BUFFEND. From this symbol definition, MAXLEN and HALFSZ can be determined Prepared by P.Vasantha kumari 47

accordingly. This completes the symbol definition process. If any symbols remains undefined at the end of the program, the assembler would flag them as errors. 2.8 Implementation Examples MASM Assembler An MASM assembler language program is written as a collection of segments. Each segment is defined as belonging to a particular class, corresponding to its contents. Commonly used classes are CODE, DATA, CONST and STACK. Segments are addressed via x86 segment registers during the program execution. Code segments are addressed using code segment register CS, and stack segments are addressed using stack segment register SS. Thee segment registers are automatically set by the system loader when a program is loaded for execution. Register CS is set to indicate the segment that contains the starting address specified in the END statement of the program. Data segments including constants segments are addressed using DS, ES, FS, or GS. The segment register can be specified explicitly by the programmer. If the programmer does not specify a segment register, one is selected by the assembler. By default, the assembler assumes the default register is DS. The register can be changed using the assembler directive ASSUME. For example, ASSUME ES: DATASEG1 Prepared by P.Vasantha kumari 48

tells the assembler to assume the register ES indicates the segment DATASEG1. Jump instructions are assembled in two different ways, depending on whether the target of the jump is in the same code segment or in different code segment. A near jump is a jump to a target in the same code segment; a far jump is a jump to a target in a different code segment. A near jump is assembled using current code segment register CS and far jump must be assembled using different segment register, which is specified in an instruction prefix. Near jump instruction occupies 2 or 3 bytes of memory (depending upon whether the jump address s within 128 bytes of the current instruction) where as far jump occupies 5 bytes of memory. By default, MASM assumes that a forward jump is near jump. If the target of the jump is in another code segment, the programmer must warn the assembler by writing JMP FAR PTR TARGET If the jump address is within 128 bytes of the current instruction, the programmer can specify the shorter (2-bytes) near jump by writing JMP SHORT TAGET If the JMP to TARGET is a far jump, and the programmer does not specify PTR, a problem occurs. During pass 1, the assembler reserves 3 bytes for the jump instruction. The actual assembled instruction requires 5 bytes. In the earlier version, it causes an error. In later versions, the MASM assembler can repeat pass 1 to general location counter values. The length of the instruction depends on the operands that are used. Immediate operands may occupy from 1to 4 bytes in the instruction. An operand that specifies a memory location may take various amount of space in the instruction. Segments in an MASM can be written in more than one part. If a segment directive specifies the same name a s a previously defined segment, it is considered to be a continuation of that segment. References between the segments are handled by assembler and the external references are handled by the linker. Prepared by P.Vasantha kumari 49

MAM assembler which allows easy and efficient execution of the program in a variety of operating system environments. It also produce an instruction timing listing that shows the number of clock cycles required to execute each machine instruction. Prepared by P.Vasantha kumari 50