STEVEN R. BAGLEY THE ASSEMBLER

Similar documents
IWKS 3300: NAND to Tetris Spring John K. Bennett. Assembler

Chapter 6: The Assembler The Assembler Hack Assembly-to-Binary Translation Specification

Machine (Assembly) Language

IWKS 2300/5300 Fall John K. Bennett. Machine Language

Machine (Assembly) Language Human Thought

Computer Architecture

Chapter 6: Assembler

Assembler Human Thought

Assembler. Building a Modern Computer From First Principles.

Reversing. Time to get with the program

Introduction: From Nand to Tetris

4. Computer Architecture 1

Introduction: Hello, World Below

EP1200 Introduction to Computing Systems Engineering. Computer Architecture

ECE260: Fundamentals of Computer Engineering

When an instruction is initially read from memory it goes to the Instruction register.

CSCI 1100L: Topics in Computing Lab Lab 11: Programming with Scratch

More advanced CPUs. August 4, Howard Huang 1

Lecture 8: Control Structures. Comparing Values. Flags Set by CMP. Example. What can we compare? CMP Examples

CSC 220: Computer Organization Unit 12 CPU programming

(Refer Slide Time: 00:01:53)

Writing ARM Assembly. Steven R. Bagley

Building a Virtual Computer

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

von Neumann Architecture Basic Computer System Early Computers Microprocessor Reading Assignment An Introduction to Computer Architecture

Basic Computer System. von Neumann Architecture. Reading Assignment. An Introduction to Computer Architecture. EEL 4744C: Microprocessor Applications

1. Prove that if you have tri-state buffers and inverters, you can build any combinational logic circuit. [4]

Control Structures. Code can be purely arithmetic assignments. At some point we will need some kind of control or decision making process to occur

Intel x86 Jump Instructions. Part 5. JMP address. Operations: Program Flow Control. Operations: Program Flow Control.

5.7. Microprogramming: Simplifying Control Design 5.7

Intel x86 Jump Instructions. Part 5. JMP address. Operations: Program Flow Control. Operations: Program Flow Control.

2.2 THE MARIE Instruction Set Architecture

Arithmetic-logic units


PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

Boolean Arithmetic. From Nand to Tetris Building a Modern Computer from First Principles. Chapter 2

CSC 252: Computer Organization Spring 2018: Lecture 11

10/5/2016. Review of General Bit-Slice Model. ECE 120: Introduction to Computing. Initialization of a Serial Comparator

MP 3 A Lexer for MiniJava

Binghamton University. CS-211 Fall Syntax. What the Compiler needs to understand your program

Week 6: Processor Components

Computer Organization Chapter 4. Prof. Qi Tian Fall 2013

A Small Interpreted Language

Signed umbers. Sign/Magnitude otation

History of Computing. Ahmed Sallam 11/28/2014 1

For Example: P: LOAD 5 R0. The command given here is used to load a data 5 to the register R0.

Computer Architecture

Data Representation Type of Data Representation Integers Bits Unsigned 2 s Comp Excess 7 Excess 8

Lecture 11: Control Unit and Instruction Encoding

CS311 Lecture: The Architecture of a Simple Computer

SOEN228, Winter Revision 1.2 Date: October 25,

COSC 2P95. Procedural Abstraction. Week 3. Brock University. Brock University (Week 3) Procedural Abstraction 1 / 26


Real instruction set architectures. Part 2: a representative sample

Five classic components

Computer architecture Assignment 3

JAVASCRIPT AND JQUERY: AN INTRODUCTION (WEB PROGRAMMING, X452.1)

mith College Computer Science CSC231 Assembly Week #9 Spring 2017 Dominique Thiébaut

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Data Paths and Microprogramming

Table of Laplace Transforms

Basic Assembly SYSC-3006

Programming for Engineers Introduction to C

16.1. Unit 16. Computer Organization Design of a Simple Processor

Computer Systems C S Cynthia Lee

ENGIN 241 Digital Systems with Lab

IA Digital Electronics - Supervision I

Chapter 10 Language Translation

Compiler Construction D7011E

Subset sum problem and dynamic programming

Job Posting (Aug. 19) ECE 425. ARM7 Block Diagram. ARM Programming. Assembly Language Programming. ARM Architecture 9/7/2017. Microprocessor Systems

Compiler Code Generation COMP360

Grammars and Parsing, second week

Introduction to Computer Organization. Final Exam Summer 2017

CSI32 Object-Oriented Programming

COS 140: Foundations of Computer Science

Course overview. Introduction to Computer Yung-Yu Chuang. with slides by Nisan & Schocken (

Chapter 2, Part I Introduction to C Programming

Parsing and Pattern Recognition

Question Total Possible Test Score Total 100

COSC 243. Computer Architecture 1. COSC 243 (Computer Architecture) Lecture 6 - Computer Architecture 1 1

LESSON 1. A C program is constructed as a sequence of characters. Among the characters that can be used in a program are:

CAD4 The ALU Fall 2009 Assignment. Description

Programming Model 2 A. Introduction

Tutorial 1: Programming Model 1

CS125 : Introduction to Computer Science. Lecture Notes #11 Procedural Composition and Abstraction. c 2005, 2004 Jason Zych

CONTENTS: What Is Programming? How a Computer Works Programming Languages Java Basics. COMP-202 Unit 1: Introduction

mith College Computer Science CSC231 Assembly Week #10 Fall 2017 Dominique Thiébaut

CSC 220: Computer Organization Unit 10 Arithmetic-logic units

CSE 141 Lab NOTES. How we ll grade Part A

QUIZ. What is wrong with this code that uses default arguments?

CSC-105 Exam #1 October 10, 2013

Music. Numbers correspond to course weeks EULA ESE150 Spring click OK Based on slides DeHon 1. !

CS Introduction to Data Structures How to Parse Arithmetic Expressions

Machine code. Nils Jansen December 12, 2017

Parallel logic circuits

Computer Science 104:! Y86 & Single Cycle Processor Design!

STEVEN R. BAGLEY ARM: LOOPS AND ADDRESSING

Syllabus for Computer Science General Part I

Low-Level Essentials for Understanding Security Problems Aurélien Francillon

Transcription:

STEVEN R. BAGLEY THE ASSEMBLER

INTRODUCTION Looking at how to build a computer from scratch Started with the NAND gate and worked up Until we can build a CPU Reached the divide between hardware and software Today, looking at how the Assembler works Or Machine Language as N2T calls it

THOUGHT SOFTWARE ALGORITHMS SOFTWARE C OS MACHINE CODE CPU HARDWARE ALU MUX16 ADD16 OR16 MUX ADDER DMUX AND OR NOT REGISTER BIT D FLIP-FLOP HARDWARE NAND Start off with an abstract idea of what we want the program to do, convert that into algorithms then into C and then directly into Machine code the hardware can execute At each step it s getting less abstract and more concrete Assembly sits at the human side of Machine Code

THOUGHT SOFTWARE ALGORITHMS SOFTWARE C A S S E M B LY MACHINE CODE OS CPU HARDWARE ALU MUX16 ADD16 OR16 MUX ADDER DMUX AND OR NOT REGISTER BIT D FLIP-FLOP HARDWARE NAND Start off with an abstract idea of what we want the program to do, convert that into algorithms then into C and then directly into Machine code the hardware can execute At each step it s getting less abstract and more concrete Assembly sits at the human side of Machine Code

THE ASSEMBLER Assembly language is a symbolic representation of machine code In a human readable form An Assembler is a tool that takes this symbolic representation Converts them into the binary bit patterns needed by the CPU Can also provide help during the conversion Changing the syntax of the program Not the semantics Demo with the N2T Assembler running on a real piece of assembly code E.g. allowing you to use labels instead of needing to compute which address. Syntax how its expressed Semantics it s meaning

THE ASSEMBLER Assembler has many of the same stages as a compiler But generally in a much simplified form Understanding how an assembler works gives us an insight into what the compiler must do Also helps us to understand how the bits in an instruction relate to its function Which might help us understand what the CPU is doing on the other side

Typical structure of an Assembler or compiler Parser usually split into two phases. Firstly, tokenise break the ASCII characters up into tokens. Secondly, use those tokens to build the Syntax tree Then from the syntax tree we can generate the machine code for each instruction. I.e. the correct bit patterns

M=1 M=0 (LOOP) @100 D=D-A D; JGT M=D+M M=M+1 @LOOP 0;JMP (END) 0; JMP Typical structure of an Assembler or compiler Parser usually split into two phases. Firstly, tokenise break the ASCII characters up into tokens. Secondly, use those tokens to build the Syntax tree Then from the syntax tree we can generate the machine code for each instruction. I.e. the correct bit patterns

M=1 M=0 (LOOP) @100 D=D-A D; JGT M=D+M M=M+1 @LOOP 0;JMP (END) 0; JMP PARSER Typical structure of an Assembler or compiler Parser usually split into two phases. Firstly, tokenise break the ASCII characters up into tokens. Secondly, use those tokens to build the Syntax tree Then from the syntax tree we can generate the machine code for each instruction. I.e. the correct bit patterns

M=1 M=0 (LOOP) @100 D=D-A D; JGT M=D+M M=M+1 @LOOP 0;JMP (END) 0; JMP PARSER SYNTAX TREE Typical structure of an Assembler or compiler Parser usually split into two phases. Firstly, tokenise break the ASCII characters up into tokens. Secondly, use those tokens to build the Syntax tree Then from the syntax tree we can generate the machine code for each instruction. I.e. the correct bit patterns

M=1 M=0 (LOOP) @100 D=D-A D; JGT M=D+M M=M+1 @LOOP 0;JMP (END) 0; JMP PARSER SYNTAX TREE CODE GENERATE Typical structure of an Assembler or compiler Parser usually split into two phases. Firstly, tokenise break the ASCII characters up into tokens. Secondly, use those tokens to build the Syntax tree Then from the syntax tree we can generate the machine code for each instruction. I.e. the correct bit patterns

M=1 M=0 (LOOP) @100 D=D-A D; JGT M=D+M M=M+1 @LOOP 0;JMP (END) 0; JMP PARSER SYNTAX TREE CODE GENERATE 0000000000010000 1110111111001000 0000000000010001 1110101010001000 0000000000010000 1111110000010000 0000000001100100 1110010011010000 0000000000010010 1110001100000001 0000000000010000 1111110000010000 0000000000010001 1111000010001000 0000000000010000 1111110111001000 0000000000000100 1110101010000111 0000000000010010 1110101010000111 Typical structure of an Assembler or compiler Parser usually split into two phases. Firstly, tokenise break the ASCII characters up into tokens. Secondly, use those tokens to build the Syntax tree Then from the syntax tree we can generate the machine code for each instruction. I.e. the correct bit patterns

ASSEMBLER SYNTAX Assembly languages almost always use a rigid syntax One instruction per line One to one mapping between assembly instruction and generated machine code Makes writing the parser much simpler Support for labels which complicate things slightly In fact you could get away without a syntax tree for an assembler since there is a one to one mapping between assembler instruction and machine code pattern. IT also makes two-pass assembly simpler as we ll see later.

HACK ASSEMBLER SYNTAX Hack assembler syntax is simple Each line can contain either: An Instruction A instruction C instruction A Label Two different instruction types Label just labels a particular point in the program

ASSEMBLER OPERATION Basic operation of the Assembler then is straight-forward While not end of file Read a line from file Determine type of line (Parser) Could be Instruction or a Label If an Instruction, generate correct bit-pattern for instruction (Code Generate) If a label, note position of label in generated output And repeat We ll ignore comments but they are just ignored as input Last point means we need to know what address each instruction is generated on

PARSING Going to use a sample assembly file as an example See how we would parse each line Will assume that white space has already been stripped from the line First character of buffer will be the first character of the instruction

M=1 M=0 (LOOP) @100 D=D-A D; JGT M=D+M M=M+1 @LOOP 0;JMP (END) 0; JMP Use this program as an example. Follow through how we convert the different types of instruction as we see them First thing we have is an A-instruction

PARSING A-INSTRUCTION Assembler language for all A-instructions start with an @ symbol So if it starts with an @, it has to be an A instruction A instructions load the A register with a value The value is the second half of the instruction Assembler lets it be either A literal value The address of a label M=1 M=0 (LOOP) @100 D=D-A D; JGT... The name of a variable In the case of a label, or variable name we need to calculate the address from the name Look at that later

PARSING A-INSTRUCTION Easy to tell if its a value or label Values are a series of digits If the first character after the @ is a digit, then it must be a value This is why most programming languages don t let you start a label with a digit Would be ambiguous whether it was a literal value or part of a label without parsing the whole label M=1 M=0 (LOOP) @100 D=D-A D; JGT... In the case of a label, or variable name we need to calculate the address from the name Look at that later Aim is to make the programming language easy to understand

CODE GENERATING A-INSTRUCTION Can extract the value (in this case 100) from the line and convert it to an integer Then need to generate the correct bit-pattern for an A-instruction M=1 M=0 (LOOP) @100 D=D-A D; JGT... In the case of a label, or variable name we need to calculate the address from the name Look at that later

A-INSTRUCTION The A-instruction is used to set the A-register to a 15-bit value Assembler syntax: @value Binary: 0vvv vvvv vvvv vvvv So @5, loads A with the value 5 Binary: 0000 0000 0000 0101 where the 15 vs for the 15-bits for the binary value

CODE GENERATING A-INSTRUCTION Need to make sure the value can fit in 15-bits Since this is all we have space to encode in the instruction If it can t, then we have an error need to flag it and stop assembling Next step is to produce the correct bit-pattern for the instruction Most significant bit must be zero to signify that it is an A-instruction Rest of the bits (0 14) are just the binary number for the value 15-bits allows us to store all the positive numbers you can fit in a 16-bit register (0-32767), we d need to take a different approach to store a negative number

HACK CPU INTERNALS It s effectively the opposite of what happens in the CPU. The assembler produces the bit patterns The Instruction decoder looks at the bit pattern to work out which bits of the CPU to turn (or off) Demo how this works on the screen Demo how to write an assembler

PARSING C-INSTRUCTION Next instruction is a C-Instruction Format of these can vary immensely Makes it trickier to write a parser for it On the other hand, we can easily tell if it is a label or A-Instruction So we ll assume for this implementation that any other line is a C- instruction Once we know it is a C-instruction we can start to break it down M=1 M=0 (LOOP) @100 D=D-A D; JGT... First two are relatively straight-forward and similar but some of the others are radically different Any other non-blank line

C INSTRUCTION Does everything else Assembler syntax: dest=comp;jump Either dest field or jump field can be omitted comp is some computation, specified by the c x bits below Binary: 111a c 1 c 2 c 3 c 4 c 5 c 6 d 1 d 2 d 3 j 1 j 2 j 3 a switches one side of the computation between A register (when 0) and M (when 1) In our example the jump is omitted (so no semicolon) Other side of the computation is always D

PARSING C-INSTRUCTION C-instructions contain are split around the ; Left-hand side contains the ALU operation to perform Including optionally updating a value stored in a register/memory Right-hand side specifies whether to jump or not Right-hand side can be optional The ; can only be optional if the jump isn t present M=1 M=0 (LOOP) @100 D=D-A D; JGT... First two are relatively straight-forward and similar but some of the others are radically different Any other non-blank line

PARSING C-INSTRUCTION Can effectively split the parsing in two around the ; Parse right-hand side to work out jump Just string comparison (to find out the correct value) Parse left-hand side to work out the ALU operation and register/ memory updates Look for = in left hand side If found, parse left-hand side of = to find what to update M=1 M=0 (LOOP) @100 D=D-A D; JGT... First two are relatively straight-forward and similar but some of the others are radically different Any other non-blank line

C INSTRUCTION: COMPUTATION dest = when a = 0 c 1 c 2 c 3 c 4 c 5 c 6 when a =1 0 1 0 1 0 1 0 0 1 1 1 1 1 1 1 1-1 1 1 1 0 1 0-1 D 0 0 1 1 0 0 D A 1 1 0 0 0 0 M!D 0 0 1 1 0 1!D!A 1 1 0 0 0 1!M -D 0 0 1 1 1 1 -D -A 1 1 0 0 1 1 -M D+1 0 1 1 1 1 1 D+1 A+1 1 1 0 1 1 1 M+1 D-1 0 0 1 1 1 0 D-1 A-1 1 1 0 0 1 0 M-1 D+A 0 0 0 0 1 0 D+M D-A 0 1 0 0 1 1 D-M A-D 0 0 0 1 1 1 M-D D&A 0 0 0 0 0 0 D&M D A 0 1 0 1 0 1 D M As used by ALU c-bits select what operation is placed into the destination These are the same bit patterns that control the ALU we designed earlier Can connect these bits up to the ALU And the output to whatever destination we want

C INSTRUCTION: DESTINATION d 1 d 2 d 3 destination 0 0 0 null not stored 0 0 1 M RAM[A] updated 0 1 0 D D register updated 0 1 1 MD RAM[A] and D updated 1 0 0 A A register updated 1 0 1 AM A and RAM[A] 1 1 0 AD A and D registers 1 1 1 AMD A, D and RAM[A] updated Each destination bit basically describes whether one of the three possible destinations is updated a bit (e.g. Memory is updated whenever d3 is set

C INSTRUCTION: JUMP j 1 j 2 j 3 mnemonic effect 0 0 0 null No Jump 0 0 1 JGT If out > 0 then jump 0 1 0 JEQ If out = 0 then jump 0 1 1 JGE If out >= 0 then jump 1 0 0 JLT If out < 0 then jump 1 0 1 JNE If out!= 0 then jump 1 1 0 JLE If out <= 0 then jump 1 1 1 JMP Always Jump We can chose between

CODE GENERATION C-INSTRUCTION Again just a matter of setting the correct bits based on the input This time the 16-bits of the instruction are split into groups Need to consider each of the groups separately Start with the simple ones Jump bits Destination bits Parsing more complex for the ALU control bits M=1 M=0 (LOOP) @100 D=D-A D; JGT... Exactly the same when dealing with CPU implementation

HACK CPU INTERNALS It s effectively the opposite of what happens in the CPU. The assembler produces the bit patterns The Instruction decoder looks at the bit pattern to work out which bits of the CPU to turn (or off) Demo how this works on the screen Demo how to write an assembler