STEVEN R. BAGLEY THE ASSEMBLER

INTRODUCTION Looking at how to build a computer from scratch Started with the NAND gate and worked up Until we can build a CPU Reached the divide between hardware and software Today, looking at how the Assembler works Or Machine Language as N2T calls it

THOUGHT SOFTWARE ALGORITHMS SOFTWARE C OS MACHINE CODE CPU HARDWARE ALU MUX16 ADD16 OR16 MUX ADDER DMUX AND OR NOT REGISTER BIT D FLIP-FLOP HARDWARE NAND Start off with an abstract idea of what we want the program to do, convert that into algorithms then into C and then directly into Machine code the hardware can execute At each step it s getting less abstract and more concrete Assembly sits at the human side of Machine Code

THOUGHT SOFTWARE ALGORITHMS SOFTWARE C A S S E M B LY MACHINE CODE OS CPU HARDWARE ALU MUX16 ADD16 OR16 MUX ADDER DMUX AND OR NOT REGISTER BIT D FLIP-FLOP HARDWARE NAND Start off with an abstract idea of what we want the program to do, convert that into algorithms then into C and then directly into Machine code the hardware can execute At each step it s getting less abstract and more concrete Assembly sits at the human side of Machine Code

THE ASSEMBLER Assembly language is a symbolic representation of machine code In a human readable form An Assembler is a tool that takes this symbolic representation Converts them into the binary bit patterns needed by the CPU Can also provide help during the conversion Changing the syntax of the program Not the semantics Demo with the N2T Assembler running on a real piece of assembly code E.g. allowing you to use labels instead of needing to compute which address. Syntax how its expressed Semantics it s meaning

THE ASSEMBLER Assembler has many of the same stages as a compiler But generally in a much simplified form Understanding how an assembler works gives us an insight into what the compiler must do Also helps us to understand how the bits in an instruction relate to its function Which might help us understand what the CPU is doing on the other side

Typical structure of an Assembler or compiler Parser usually split into two phases. Firstly, tokenise break the ASCII characters up into tokens. Secondly, use those tokens to build the Syntax tree Then from the syntax tree we can generate the machine code for each instruction. I.e. the correct bit patterns

M=1 M=0 (LOOP) @100 D=D-A D; JGT M=D+M M=M+1 @LOOP 0;JMP (END) 0; JMP Typical structure of an Assembler or compiler Parser usually split into two phases. Firstly, tokenise break the ASCII characters up into tokens. Secondly, use those tokens to build the Syntax tree Then from the syntax tree we can generate the machine code for each instruction. I.e. the correct bit patterns

M=1 M=0 (LOOP) @100 D=D-A D; JGT M=D+M M=M+1 @LOOP 0;JMP (END) 0; JMP PARSER Typical structure of an Assembler or compiler Parser usually split into two phases. Firstly, tokenise break the ASCII characters up into tokens. Secondly, use those tokens to build the Syntax tree Then from the syntax tree we can generate the machine code for each instruction. I.e. the correct bit patterns

M=1 M=0 (LOOP) @100 D=D-A D; JGT M=D+M M=M+1 @LOOP 0;JMP (END) 0; JMP PARSER SYNTAX TREE Typical structure of an Assembler or compiler Parser usually split into two phases. Firstly, tokenise break the ASCII characters up into tokens. Secondly, use those tokens to build the Syntax tree Then from the syntax tree we can generate the machine code for each instruction. I.e. the correct bit patterns

M=1 M=0 (LOOP) @100 D=D-A D; JGT M=D+M M=M+1 @LOOP 0;JMP (END) 0; JMP PARSER SYNTAX TREE CODE GENERATE Typical structure of an Assembler or compiler Parser usually split into two phases. Firstly, tokenise break the ASCII characters up into tokens. Secondly, use those tokens to build the Syntax tree Then from the syntax tree we can generate the machine code for each instruction. I.e. the correct bit patterns

M=1 M=0 (LOOP) @100 D=D-A D; JGT M=D+M M=M+1 @LOOP 0;JMP (END) 0; JMP PARSER SYNTAX TREE CODE GENERATE 0000000000010000 1110111111001000 0000000000010001 1110101010001000 0000000000010000 1111110000010000 0000000001100100 1110010011010000 0000000000010010 1110001100000001 0000000000010000 1111110000010000 0000000000010001 1111000010001000 0000000000010000 1111110111001000 0000000000000100 1110101010000111 0000000000010010 1110101010000111 Typical structure of an Assembler or compiler Parser usually split into two phases. Firstly, tokenise break the ASCII characters up into tokens. Secondly, use those tokens to build the Syntax tree Then from the syntax tree we can generate the machine code for each instruction. I.e. the correct bit patterns

ASSEMBLER SYNTAX Assembly languages almost always use a rigid syntax One instruction per line One to one mapping between assembly instruction and generated machine code Makes writing the parser much simpler Support for labels which complicate things slightly In fact you could get away without a syntax tree for an assembler since there is a one to one mapping between assembler instruction and machine code pattern. IT also makes two-pass assembly simpler as we ll see later.

HACK ASSEMBLER SYNTAX Hack assembler syntax is simple Each line can contain either: An Instruction A instruction C instruction A Label Two different instruction types Label just labels a particular point in the program

ASSEMBLER OPERATION Basic operation of the Assembler then is straight-forward While not end of file Read a line from file Determine type of line (Parser) Could be Instruction or a Label If an Instruction, generate correct bit-pattern for instruction (Code Generate) If a label, note position of label in generated output And repeat We ll ignore comments but they are just ignored as input Last point means we need to know what address each instruction is generated on

PARSING Going to use a sample assembly file as an example See how we would parse each line Will assume that white space has already been stripped from the line First character of buffer will be the first character of the instruction

M=1 M=0 (LOOP) @100 D=D-A D; JGT M=D+M M=M+1 @LOOP 0;JMP (END) 0; JMP Use this program as an example. Follow through how we convert the different types of instruction as we see them First thing we have is an A-instruction

PARSING A-INSTRUCTION Assembler language for all A-instructions start with an @ symbol So if it starts with an @, it has to be an A instruction A instructions load the A register with a value The value is the second half of the instruction Assembler lets it be either A literal value The address of a label M=1 M=0 (LOOP) @100 D=D-A D; JGT... The name of a variable In the case of a label, or variable name we need to calculate the address from the name Look at that later

PARSING A-INSTRUCTION Easy to tell if its a value or label Values are a series of digits If the first character after the @ is a digit, then it must be a value This is why most programming languages don t let you start a label with a digit Would be ambiguous whether it was a literal value or part of a label without parsing the whole label M=1 M=0 (LOOP) @100 D=D-A D; JGT... In the case of a label, or variable name we need to calculate the address from the name Look at that later Aim is to make the programming language easy to understand

CODE GENERATING A-INSTRUCTION Can extract the value (in this case 100) from the line and convert it to an integer Then need to generate the correct bit-pattern for an A-instruction M=1 M=0 (LOOP) @100 D=D-A D; JGT... In the case of a label, or variable name we need to calculate the address from the name Look at that later

A-INSTRUCTION The A-instruction is used to set the A-register to a 15-bit value Assembler syntax: @value Binary: 0vvv vvvv vvvv vvvv So @5, loads A with the value 5 Binary: 0000 0000 0000 0101 where the 15 vs for the 15-bits for the binary value

CODE GENERATING A-INSTRUCTION Need to make sure the value can fit in 15-bits Since this is all we have space to encode in the instruction If it can t, then we have an error need to flag it and stop assembling Next step is to produce the correct bit-pattern for the instruction Most significant bit must be zero to signify that it is an A-instruction Rest of the bits (0 14) are just the binary number for the value 15-bits allows us to store all the positive numbers you can fit in a 16-bit register (0-32767), we d need to take a different approach to store a negative number

HACK CPU INTERNALS It s effectively the opposite of what happens in the CPU. The assembler produces the bit patterns The Instruction decoder looks at the bit pattern to work out which bits of the CPU to turn (or off) Demo how this works on the screen Demo how to write an assembler

PARSING C-INSTRUCTION Next instruction is a C-Instruction Format of these can vary immensely Makes it trickier to write a parser for it On the other hand, we can easily tell if it is a label or A-Instruction So we ll assume for this implementation that any other line is a C- instruction Once we know it is a C-instruction we can start to break it down M=1 M=0 (LOOP) @100 D=D-A D; JGT... First two are relatively straight-forward and similar but some of the others are radically different Any other non-blank line

C INSTRUCTION Does everything else Assembler syntax: dest=comp;jump Either dest field or jump field can be omitted comp is some computation, specified by the c x bits below Binary: 111a c 1 c 2 c 3 c 4 c 5 c 6 d 1 d 2 d 3 j 1 j 2 j 3 a switches one side of the computation between A register (when 0) and M (when 1) In our example the jump is omitted (so no semicolon) Other side of the computation is always D

PARSING C-INSTRUCTION C-instructions contain are split around the ; Left-hand side contains the ALU operation to perform Including optionally updating a value stored in a register/memory Right-hand side specifies whether to jump or not Right-hand side can be optional The ; can only be optional if the jump isn t present M=1 M=0 (LOOP) @100 D=D-A D; JGT... First two are relatively straight-forward and similar but some of the others are radically different Any other non-blank line

PARSING C-INSTRUCTION Can effectively split the parsing in two around the ; Parse right-hand side to work out jump Just string comparison (to find out the correct value) Parse left-hand side to work out the ALU operation and register/ memory updates Look for = in left hand side If found, parse left-hand side of = to find what to update M=1 M=0 (LOOP) @100 D=D-A D; JGT... First two are relatively straight-forward and similar but some of the others are radically different Any other non-blank line

C INSTRUCTION: COMPUTATION dest = when a = 0 c 1 c 2 c 3 c 4 c 5 c 6 when a =1 0 1 0 1 0 1 0 0 1 1 1 1 1 1 1 1-1 1 1 1 0 1 0-1 D 0 0 1 1 0 0 D A 1 1 0 0 0 0 M!D 0 0 1 1 0 1!D!A 1 1 0 0 0 1!M -D 0 0 1 1 1 1 -D -A 1 1 0 0 1 1 -M D+1 0 1 1 1 1 1 D+1 A+1 1 1 0 1 1 1 M+1 D-1 0 0 1 1 1 0 D-1 A-1 1 1 0 0 1 0 M-1 D+A 0 0 0 0 1 0 D+M D-A 0 1 0 0 1 1 D-M A-D 0 0 0 1 1 1 M-D D&A 0 0 0 0 0 0 D&M D A 0 1 0 1 0 1 D M As used by ALU c-bits select what operation is placed into the destination These are the same bit patterns that control the ALU we designed earlier Can connect these bits up to the ALU And the output to whatever destination we want

C INSTRUCTION: DESTINATION d 1 d 2 d 3 destination 0 0 0 null not stored 0 0 1 M RAM[A] updated 0 1 0 D D register updated 0 1 1 MD RAM[A] and D updated 1 0 0 A A register updated 1 0 1 AM A and RAM[A] 1 1 0 AD A and D registers 1 1 1 AMD A, D and RAM[A] updated Each destination bit basically describes whether one of the three possible destinations is updated a bit (e.g. Memory is updated whenever d3 is set

C INSTRUCTION: JUMP j 1 j 2 j 3 mnemonic effect 0 0 0 null No Jump 0 0 1 JGT If out > 0 then jump 0 1 0 JEQ If out = 0 then jump 0 1 1 JGE If out >= 0 then jump 1 0 0 JLT If out < 0 then jump 1 0 1 JNE If out!= 0 then jump 1 1 0 JLE If out <= 0 then jump 1 1 1 JMP Always Jump We can chose between

CODE GENERATION C-INSTRUCTION Again just a matter of setting the correct bits based on the input This time the 16-bits of the instruction are split into groups Need to consider each of the groups separately Start with the simple ones Jump bits Destination bits Parsing more complex for the ALU control bits M=1 M=0 (LOOP) @100 D=D-A D; JGT... Exactly the same when dealing with CPU implementation