MACHINE LANGUAGE AND ASSEMBLY LANGUAGE

Size: px

Start display at page:

Download "MACHINE LANGUAGE AND ASSEMBLY LANGUAGE"

Charla King
5 years ago
Views:

1 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE MACHINE LANGUAGE AND ASSEMBLY LANGUAGE 4.1 Introduction In this chapter we explore the inner workings of a computer system. The central processing unit (CPU) is where program execution takes place. The CPU is relatively complex, but can be viewed as an extension of a simple finite state machine (FSM). A key distinction between a CPU and an FSM is that the CPU needs data and instructions at the input. The instructions select operations to be performed on the data, such as addition, subtraction, Boolean AND, etc. An ordinary FSM only needs data at the input, and from the viewpoint of an FSM, instructions are just another form of data. While this is true in principle, in practice, we consider instructions and data to be different. The instruction set, with its corresponding encodings for each instruction, is a defining characteristic of a CPU. For this reason, computer systems are often identified by the type of CPU that is incorporated into the computer system. The instruction set determines the programs the system can execute and has a significant impact on system performance. Programs compiled for an IBM PC (or compatible) system use the instruction set of an 80x86 CPU, where the x is replaced with a digit that corresponds to the version, such as These programs will not run on an Apple Macintosh since the Macintosh executes the instruction set of the 680x0 CPU (where the x is again replaced with a digit that corresponds to the version, such as 68040). This does not mean that all computer systems that use the same CPU can execute the same programs, however. A program written for a Sega Genesis will not execute on a based Macintosh without extensive modifications. There is a growing movement toward compatibility among CPU types, such as for the PowerPC CPU that supports both the IBM PC and Apple Macintosh environments, in addition to its own native instruction set. The movement is far

2 104 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE from all encompassing, however, as evidenced by the incompatibilities among the PowerPC, the Pentium, the SPARC, and other CPUs which postdate the 80x86 and 680x0 families. A compiler is a computer program that transforms another program written in a high-level language that we understand like C, Pascal, or Fortran into a language that a computer understands, called machine language. A machine language is written in the instruction set of the CPU with each of the instructions encoded in binary. Machine languages differ from one machine to the next, and a C compiler for one machine will produce a different compiled program for one system than a corresponding C compiler produces for another system, for the same source program. In fact, different C compilers for the same machine can produce different compiled programs for the same source code, as we will see. In the process of compiling a program (referred to as the translation process), a high-level source program is transformed into an intermediate form called assembly language which is then translated into object code for the target machine by an assembler. In the interpretation process, a program is executed by the computer. The object code is executed a single instruction at a time, as directed by a control unit within the CPU. High level languages allow us to treat the target computer architecture that executes our programs as an abstraction. At the machine language level, however, we are very much concerned with the underlying architecture. A program written in a high level language like C, Pascal, or Fortran may look the same and execute correctly after compilation on several different computer systems. The object code that the compiler produces for each machine, however, may be very different for each computer system, even if the systems use the same instruction set. This chapter is about machine language and assembly language, and how machine language programs are executed. We will start with a simple model for a computer system, explore how it is organized, and then study how the internal components exchange data. We will then focus on how each component contributes to the execution of a simple instruction, and step through the execution of a simple machine language program. 4.2 The System Bus Model Revisited Before we look at the internal components of the CPU, we need to understand the relationship of the CPU to the other components of a computer system.

3 System Bus CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE 105 When a compiled program is executed, it is important to know where all of the action is happening. Figure 4-1 revisits the system bus model that we explored CPU (ALU, Registers, and Control) Memory Input and Output (I/O) Data Bus Address Bus Control Bus Figure 4-1 The system bus model of a computer system. in Chapter 1. Not all of the components are connected to the system bus in the same way. The CPU generates addresses that are placed onto the address bus, and the memory receives addresses from the address bus. The memory never generates addresses, and the CPU never receives addresses, and so there are no corresponding connections in those directions. In a typical scenario, a user writes a high level program, which a compiler translates into assembly language. An assembler then translates the assembly language program into machine code, which is stored on a disk. Prior to execution, the machine code program is loaded from the disk into the main memory by an operating system. During program execution, each instruction is brought into the ALU from the memory, one instruction at a time, along with any data that is needed to execute the instruction. The output of the program is placed on a device such as a video display, or a disk. All of these operations are orchestrated by a control unit, which we will explore in detail in Chapter 9. Communication among the three components (CPU, Memory, and I/O) is handled with busses. An important consideration is that the instructions are executed inside of the ALU, even though all of the instructions and data are initially stored in the memory. This means that instructions and data must be loaded from the memory into the ALU registers, and results must be stored back to the memory from the ALU registers.

4 106 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE In the remainder of this chapter, we will study an architecture that is based on the commercial Scalable Processor Architecture (SPARC) that was developed at Sun Microsystems in the mid-1980 s. The SPARC has become a popular architecture since its introduction, which is partly due to its open nature: the full definition of the SPARC architecture is made readily available to the public (SPARC, 1992). In this chapter, we will look at just a subset of the SPARC. We will see more of the SPARC in the remaining chapters. 4.3 Memory Computer memory consists of a collection of consecutively numbered registers. Each register is referred to as a memory location, and stores exactly one binary value at any time. The number of bits in each memory location varies from system to system. A byte is a collection of eight adjacent bits (sometimes referred to as an octet) that is the smallest addressable memory location on many computers. A nibble is a less common term, which refers to a collection of four adjacent bits. The meanings of the terms bit, byte, and nibble are generally agreed upon regardless of the specifics of an architecture, but the meaning of word depends upon the particular architecture. Typical word sizes are 16, 32, 64, and 128 bits, with the 32-bit word size being the common form for ordinary computers these days (and the 64-bit word growing in popularity). A comparison of these word sizes is shown in Figure 4-2. Figure 4-2 Bit 0 Nibble 0110 Byte bit word (halfword) bit word bit word (double) bit word (quad) Common sizes for data types. Memory locations are arranged linearly in consecutive order as shown in Figure 4-3. Each of the numbered locations corresponds to a specific stored word (a word is composed of four bytes here). The unique number that identifies each word is referred to as its address. Since addresses are counted in sequence begin-

5 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE 107 Address 0 Data 32 bits Reserved for operating system Address Control Data In 2048 User Space Top of stack System Stack Stack pointer MEMORY Bottom of stack Disk Terminal Printer byte I/O space Data Out Figure 4-3 Memory map for ARC example architecture (not drawn to scale). ning with zero, the highest address is one less than the size of the memory. The highest address for the 2 32 byte memory is The lowest address is 0. The ARC has a 32-bit address space, which means that a program can access a byte of memory anywhere in the range from 0 to The address space for our example architecture is divided into distinct regions which are used for the operating system, input and output (I/O), user programs, and the system stack, which comprise the memory map, as shown in Figure 4-3. The memory map differs from one implementation to another, which is partly why programs compiled for the same type of processor may not be compatible across systems. The lower 2 11 = 2048 addresses of the memory map are reserved for use by the operating system. The user space is where a user s assembled program is loaded, and can grow during operation from location 2048 until it meets up with the system stack. The system stack starts at location and grows toward lower addresses. The portion of the address space between 2 31 and is reserved for I/O devices. The memory map is thus not entirely composed of real memory, and in fact there may be large gaps where neither real memory nor I/O devices exist. Since I/O devices are treated like memory locations, ordinary memory read and write commands can be used for reading and writing devices. This is referred to as memory mapped I/O.

6 108 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE It is important to keep the distinction clear between what is an address and what is data. An address in the ARC is 32 bits wide, and a word is also 32 bits wide, but they are not the same thing. An address is a pointer to a memory location, which holds data. Although the ARC has several data types (byte, halfword, integer, etc.), we will initially consider only the integer data type. The ARC is a big-endian architecture, so-named for the issue of whether eggs should be broken on the big or little end, which caused a war by bickering politicians in Jonathan Swift s Gulliver s Travels. In a big-endian architecture such as the ARC, the address of a 32-bit word is also the address of its most significant byte. The remaining bytes have consecutively larger addresses. In a little-endian architecture, the least significant byte of a 32-bit integer has the smallest address. A comparison of the big and little-endian formats is illustrated in Figure 4-4. The Byte 31 Big-Endian 0 31 Little-Endian MSB LSB MSB LSB 0 x x+1 x+2 x+3 x+3 x+2 x+1 x Figure 4-4 Word address is x for both big-endian and little-endian formats. Big-endian and little-endian formats. largest possible address in the ARC is , which points to the highest byte. This is the rightmost byte in a big endian word, and so the address of the highest word in the memory map is three bytes to the left of this, or The size of the memory in bytes is usually represented in units K, which is 2 10 = 1024 locations; or M, which is 2 20 = locations. A 2 10 byte memory is said to be a 1 Kbyte (kilobyte) memory, and a memory with 2 20 locations, each the size of a 32-bit word, is said to be a 1 Mword (megaword) memory. Notice that this notation is only used for memory: a K unit normally corresponds to 10 3 and an M unit normally corresponds to 10 6, which we see in Appendix A in the context of cycle times. Through the use of the system bus, data can be either read from or written to any location in the memory under the control of the CPU. When the CPU places an address on the address bus and also places a read control signal on the control bus, then the addressed word is transferred from the memory to the CPU over the data bus. In a similar manner, data is written from the CPU into a memory location when the CPU places the address of the memory location to be written

7 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE 109 on the address bus, places the data to be written on the data bus, and directs the memory to write by placing a write control signal on the control bus. 4.4 Input and Output One way that communication between devices and the rest of the machine can be handled is with special instructions and with a special I/O bus reserved for this purpose. An alternative method for interacting with I/O devices that we saw in the previous section is through the use of memory mapped I/O, in which devices occupy sections of the address space where no ordinary memory exists. Devices are accessed as if they are memory locations, and so there is no need for handling devices in a special way. Consider the memory map for the fictitious video game (the Stega ) introduced in Chapter 1, which is illustrated in Figure 4-5. The Stega can accept up to two Address Data 32 bits Reserved for built-in bootstrap and graphics routines Plug-in game cartridge #1 Plug-in game cartridge #2 Unused Working Memory Top of stack System Stack Bottom of stack Stack pointer FFFFEC 16 FFFFF0 16 FFFFF4 16 Screen Flash Joystick x Joystick y I/O space Figure Memory map for the Stega video game. byte game cartridges. Each 32-bit word is composed of four 8-bit bytes in a big endian format, just like the ARC.

8 110 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE The only real memory occupies the address space between 2 22 and (Remember: is the address of the leftmost byte of the highest word in the big-endian format.) The rest of the address space is occupied by other components. The address space between 0 and (inclusive) contains built-in programs for the power-on bootstrap operation and basic graphics routines. The address space between 2 16 and is used for two plug-in game cartridges. Note that valid information is available only when the cartridges are physically inserted into the machine. Note also that the cartridges can be replaced with anything else that behaves like memory to the rest of the system. For instance, a musical keyboard can be inserted into one of the cartridge slots, and special operations can take place when certain locations are accessed. Finally, the address space between 2 23 and is used for I/O devices. For this system, the X and Y positions of a joystick are automatically updated in registers that are placed in the memory map. The registers are accessed by simply reading from the memory locations where these registers are located. The Screen Flash location causes the screen to flash whenever it is written. Suppose that we would like to write a simple program that flashes the screen whenever the joystick is moved. The flowchart in Figure 4-6 illustrates how this might be done. The X and Y registers are first read, and are then compared with the previous X and Y values. If either position has changed, then the screen is flashed and the previous X and Y values are updated and the process repeats. If neither position has changed, then the process simply repeats. This is an example of the programmed I/O method of accessing a device. (See problem 4.3 at the end of the chapter for a more detailed description.) 4.5 The CPU Now that we are familiar with the basic components of the system bus model, we are ready to explore the inside of a CPU. At a minimum, the CPU consists of a data section that contains registers and an ALU, and a control section, as illustrated in Figure 4-7. The data section is also referred to as the datapath. The control unit of a computer is responsible for executing a program that is stored in the main memory. The object code is interpreted by the control unit a single instruction at a time. The steps that the control unit carries out in executing a program are: 1) Fetch an instruction from main memory.

9 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE 111 Issue read or write request to disk. Read joystick X register. Read joystick Y register. Compare old X and Y values to new values No Did X or Y change? Yes Flash screen Update X and Y registers Figure 4-6 Flowchart illustrating the control structure of a program that tracks a joystick. 2) Decode the opcode, which identifies the instruction. 3) Read operand(s) from main memory, if any. 4) Execute the instruction and store results. 5) Go to Step 1. This is known as the fetch-execute cycle. For instance, when adding two numbers, the control unit must fetch the instruction, determine that the instruction

10 112 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE Registers Control Unit ALU Datapath (Data Section) Control Section Figure 4-7 High level view of a CPU. System is in fact an addition instruction, retrieve the operands from their source registers, initiate the addition process, and store the result back into a register. The control unit may also need to access I/O devices such as disks, a keyboard, or a video display. The control unit is responsible for coordinating these different units in the execution of a computer program, and can be thought of as a form of a computer within a computer in the sense that it makes decisions as to how the rest of the machine behaves. The datapath is made up of a register file and the arithmetic and logic unit (ALU), as shown in Figure 4-8. The register file is a small memory, separate from the system memory, that is used as a scratchpad during computation. Typical sizes for a register file range from a few to a few thousand registers. Like the system memory, each register file location is assigned an address in sequence starting from zero. The major differences between the register file and the system memory is that the register file is contained within the CPU, and is much faster as a result of its smaller size and the use of high speed circuitry. An instruction that operates on data from the register file can often run ten times faster than the same instruction that operates on data in memory. For this reason, register intensive programs are faster than the equivalent memory intensive programs, even if it takes more register operations to do the same tasks that would require fewer operations with the memory. The heart of the processing unit is the ALU. The ALU is a combinational logic unit that implements a variety of binary operations. Operations and registers to be used during the operations are selected by the Control Unit. Figure 4-8 shows an ALU that is connected to a register file. There are two source operand inputs

11 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE 113 From Data Bus Register Source 1 (rs1) Register Source 2 (rs2) Register File Control Unit selects registers and ALU function To Address Bus Register Destination (rd) ALU To Data Bus Status to Control Unit Figure 4-8 The datapath for an example ARC implementation. to the ALU that come from the register file, which are labeled Register Source 1 (rs1) and Register Source 2 (rs2). An output from the ALU, labeled Register Destination (rd), sends results back to the register file. In most systems these connections also include a path to the System Bus so that memory and devices can be accessed. This is shown as the three connections labeled From Data Bus, To Data Bus, and To Address Bus. The control unit has two functions. It first interprets the current instruction being executed and generates the appropriate control signals for the processing unit and the bus interface unit. Then, it sequences the CPU to the next instruction in the program. 4.6 An Instruction Set Architecture One method of describing a computer architecture is in terms of the instruction set, which consists of all of the operations that are visible to a user that the architecture is capable of executing, such as addition, logical AND, or subroutine calls. At this level of description, the architecture is referred to as an instruction set architecture (ISA). The ISA defines instructions, registers, the memory, and an algorithm for controlling instruction execution. We will explore all of these ISA aspects here except for the control algorithm, which we will study in Chapter 9.

12 114 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE ARC: A REDUCED INSTRUCTION SET COMPUTER There are approximately 200 instructions in the ARC ISA, 15 of which are shown in Figure 4-9. The upper 10 instructions deal with registers, while the Memory Logical Arithmetic Control The srl (shift right logical) instruction shifts a register to the right, and copies zeros into the leftmost bit(s). This is in contrast to a shift right arithmetic instruction which is supported in some architectures, in which the leftmost bit of the original register is copied into the newly created vacant bit(s) in the left side of the register. The addcc instruction performs a 32-bit two s complement addi- Mnemonic Meaning ld Load a register from memory st sethi andcc orcc orncc srl addcc call jmpl be bneg bcs bvs ba Store a register into memory Load the 22 most significant bits of a register Bitwise logical AND Bitwise logical OR Bitwise logical NOR Shift right (logical) Add Call subroutine Jump and link (return from subroutine call) Branch if equal Branch if negative Branch on carry Branch on overflow Branch always Figure 4-9 A subset of the instruction set for the ARC ISA. lower five do not. The ld and st instructions transfer a word between the main memory and one of the ARC registers. These are the only instructions that can access memory. The sethi instruction sets the 22 most significant bits (MSBs) of a register, and can be used to construct an arbitrary 32-bit word in a register. The andcc, orcc, and orncc instructions perform a bit-by-bit AND, OR, and NOR operation, respectively, on their operands. For the andcc instruction, each bit of the result is a 1 if the corresponding bits of both operands are 1, otherwise the result is 0. For the orcc instruction, each bit of the result is a 1 if either or both of the corresponding bits in the operands are 1, otherwise the result is 0. The orncc operation is the complement of orcc, so each bit of the result is 0 if either or both of the corresponding bits in the operands are 1, otherwise the result is 1. The cc suffixes are part of the instruction names (mnemonics) and have a meaning that is described later.

13 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE 115 tion on its operands. The call and jmpl instructions form a pair that are used in calling and returning from a subroutine, respectively. The lower five instructions deal with conditionals. The be, bneg, bcs, bvs, and ba instructions cause a branch in the execution of a program, and are used in implementing high level constructs such as if-then-else and do-while. Detailed descriptions of the instructions and examples of their usages are given in the sections that follow ARC ASSEMBLY LANGUAGE FORMAT We can use any format for an assembly language program, and Figure 4-10 Label Mnemonic Source operands Destination operand Comment lab_1: addcc %r1, %r2, %r3! Sample assembly code Figure 4-10 Format for a SPARC assembly language statement. shows a suggested format for the commercial SPARC assembly language. The format consists of four fields for a label, an instruction, the operands, and a comment. The label is optional, and not every line in an assembly language program will have one. A label may consist of any combination of alphabetic or numeric characters, underscores (_), dollar signs ($), or periods (.), as long as the first character is not a digit. A label must be followed by a colon. The language is sensitive to case, and so a distinction is made between upper and lower case letters. The language is free format in the sense that any field can begin in any column, but the relative left-to-right ordering must be maintained. If a label appears in a line of assembly code, it will be in the leftmost position. To the right of the label field is the instruction field, which always appears in lower case form. For this example, the addcc instruction specifies an addition operation. The operand field follows to the right of the instruction field. The ARC architecture contains 32 data registers labeled %r0 %r31, that each hold a 32-bit word. There is also a 32-bit Processor State Register (PSR) that describes the current state of the processor, and a 32-bit program counter (PC), that keeps track of the instruction being executed, as illustrated in Figure The PSR is labeled %psr and the PC register is labeled %pc. Register %r0 always contains the value 0, which cannot be changed. Registers %r14 and %r15 have additional uses as a stack pointer (%sp) and a link register, respectively, which

14 116 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE Register 00 %r0 [= 0] Register 01 %r1 Register 02 %r2 Register 03 %r3 Register 04 %r4 Register 05 %r5 Register 06 %r6 Register 07 %r7 Register 08 %r8 Register 09 %r9 Register 10 %r10 Register 11 %r11 Register 12 %r12 Register 13 %r13 Register14 %r14 [%sp] Register 15 %r15 [link] Register 16 %r16 Register 17 %r17 Register 18 %r18 Register 19 %r19 Register 20 %r20 Register 21 %r21 Register 22 %r22 Register 23 %r23 Register 24 %r24 Register 25 %r25 Register 26 %r26 Register 27 %r27 Register 28 %r28 Register 29 %r29 Register 30 %r30 Register 31 %r31 PSR %psr PC %pc 32 bits 32 bits Figure 4-11 are described later. User-visible registers in the ARC. Operands in an assembly language statement are separated by commas, and the destination operand always appears in the rightmost position in the operand field. Thus, the example shown in Figure 4-10 specifies adding registers %r1 and %r2, with the result placed in %r3. If %r0 appears in the destination operand field instead of %r3, the result is discarded. The default base for a numeric operand is 10, and so the assembly language statement: addcc %r1, 12, %r3 shows an operand of (12) 10 that will be added to %r1, with the result placed in %r3. If a pound sign # appears in front of the operand, then the operand is interpreted in hexadecimal. The comment field follows the operand field, and begins with an exclamation mark! and terminates at the end of the line. Not all lines of a ARC assembly language program will contain all four fields. Some lines may consist of only comments, and some lines may be entirely blank ARC INSTRUCTION FORMATS The instruction format defines how the various bit fields of an instruction are interpreted. The ARC architecture has just a few instruction formats, and we will take a simplified view of these formats here, in which a few fields are omitted. The five formats are: SETHI, Branch, Call, Arithmetic, and Memory, as shown

15 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE 117 in Figure Each instruction has a mnemonic form such as ld, and an op SETHI Format rd op2 imm22 Branch Format cond op2 disp22 CALL format Arithmetic Formats Memory Formats disp30 i rd op3 rs rs2 1 0 rd op3 rs1 1 simm rd op3 rs rs2 1 1 rd op3 rs1 1 simm13 op Format op2 Inst. op3 (op=10) op3 (op=11) cond branch SETHI/Branch CALL Arithmetic Memory branch sethi addcc andcc orcc orncc srl jmpl ld st be bcs bneg bvs ba PSR Figure n z v c Instruction formats and PSR format for the SPARC. opcode. A particular instruction format may have more than one opcode field, which collectively identify an instruction in one of its various forms. The leftmost two bits of each instruction form the op (opcode) field, which identifies the format. The SETHI and Branch formats both contain 00 in the op field, and so they can be considered together as the SETHI/Branch format. The actual SETHI or Branch format is determined by the bit pattern in the op2 opcode field (010 = Branch; 100 = SETHI). Bit 29 in the Branch format always contains a zero. The five-bit rd field identifies the target register for the SETHI operation. The cond field identifies the type of branch, based on the condition code bits

16 118 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE (n, z, v, and c) in the PSR, as indicated at the bottom of Figure The result of executing an instruction in which the mnemonic ends with cc sets the condition code bits such that n=1 if the result of the operation is negative; z=1 if the result is zero; v=1 if the operation causes an overflow; and c=1 if the operation produces a carry. The instructions that do not end in cc do not affect the condition codes. The imm22 and disp22 fields each hold a 22-bit constant that is used as the operand for the SETHI format (for imm22) or for calculating a displacement for a branch address (for disp22). The CALL format contains only two fields: the op field, which contains the bit pattern 01, and the disp30 field, which contains a 30-bit displacement that is used in calculating the address of the called routine. The Arithmetic (op = 10) and Memory (op = 11) formats both make use of rd fields to identify either a source register for st, or a destination register for the remaining Arithmetic and Memory format instructions. The rs1 field identifies the first source register, and the rs2 field identifies the second source register. The op3 opcode field identifies the instruction according to the op3 tables shown in Figure The simm13 field is a 13-bit immediate value that is sign extended to 32 bits for the second source when the i (immediate) field is 1. The meaning of sign extended is that the leftmost bit of the simm13 field (the sign bit) is copied to the left into the remaining bits that make up a 32-bit integer, before adding it to rs1 in this case. This ensures that a two s complement negative number remains negative (and a two s complement positive number remains positive). For instance, ( 13) 10 = ( ) 2, and after sign extension to a 32-bit integer, we have ( ) 2 which is still equivalent to ( 13) 10. The Arithmetic instructions need two source operands and a destination operand, for a total of three operands. The Memory instructions only need two operands: one for the address and one for the data. The remaining source operand is also used for the address, however. The operands in the rs1 and rs2 fields are added to obtain the address when i = 0. When i = 1, then the rs1 field and the simm13 field are added to obtain the address. For the first few examples we will encounter, %r0 will be used for rs1 and so only the remaining source operand will be specified ARC DATA FORMATS The ARC supports 12 different data formats as illustrated in Figure The

17 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE 119 Signed Formats Signed Integer Byte Signed Integer Halfword Signed Integer Word Signed Integer Double s 76 0 s s s Unsigned Formats Unsigned Integer Byte Unsigned Integer Halfword Unsigned Integer Word Tagged Word Unsigned Integer Double Tag Floating Point Formats Floating Point Single Floating Point Double Floating Point Quad s exponent fraction s exponent fraction fraction 31 0 s exponent fraction fraction fraction fraction 31 0 Figure 4-13 ARC data formats. data formats are grouped into three types: signed integer, unsigned integer, and floating point. Within these types, allowable format widths are byte (8 bits), halfword (16 bits), word/singleword (32 bits), tagged word (32 bits, in which the two least significant bits form a tag and the most significant 30 bits form the value), doubleword (64 bits), and quadword (128 bits). The unsigned byte, halfword, word, and double formats are invoked by using a particular subset of the SPARC instruction set, which is more fully described in (SPARC, 1992). The instructions we have seen up to this point deal only with

18 120 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE unsigned integers. The signed versions are invoked by a different subset of the SPARC instruction set, and differ in the way that condition codes are handled. For example, a 1 in the most significant bit of a signed integer means that the integer is negative, whereas it has no influence on the (positive) sign of an unsigned integer. Thus, the n (negative) condition will be set differently for each case. The tagged word uses the two least significant bits to indicate overflow, in which an attempt is made to store a value into the word that is larger than 30 bits. Tagged arithmetic operations are used in languages with dynamically typed data, such as Lisp and Smalltalk. In its generic form, a 1 in either bit of the tag field indicates an overflow situation for that word. The tags can be used to ensure proper alignment conditions (that words begin on four-byte boundaries, doublewords begin on eight-byte boundaries, etc.), particularly for pointers. The floating point formats conform to the IEEE standard (see Chapter 2). Again, there are special instructions that invoke the floating point formats, as described in (SPARC, 1992) ARC INSTRUCTION DESCRIPTIONS Now that we know the instruction formats, we can create detailed descriptions of the 15 instructions listed in Figure 4-9, which are given below. The translation to object code is provided only as a reference, and is described in detail in the next chapter. In the descriptions below, a reference to the contents of a memory location (for ld and st) is indicated by square brackets, as in ld [x], %r1 which copies the contents of location x into %r1. A reference to the address of a memory location is specified directly, without brackets, as in call sub_r, which makes a call to subroutine sub_r. Only ld and st can access memory, therefore only ld and st use brackets. Registers are always referred to in terms of their contents, and never in terms of an address, and so there is no need to enclose references to registers in brackets. Instruction: ld Description: Load a register from main memory. The memory address must be aligned on a word boundary (that is, the address must be evenly divisible by 4). The address is computed by adding the rs1 field to either the rs2 field or the simm13 field, as appropriate for the context. Example usage: ld [x], %r1 Meaning: Copy the contents of memory location x into register %r1.

19 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE 121 Object code: (x = 2064) Instruction: st Description: Store a register into main memory. The memory address must be aligned on a word boundary. The address is computed by adding the rs1 field to either the rs2 field of the simm13 field, as appropriate for the context. The rd field of this instruction is actrually used for the source register. Example usage: st %r1, [x] Meaning: Copy the contents of register %r1 into memory location x. Object code: (x = 2064) Instruction: sethi Description: Set the high 22 bits and zero the low 10 bits of a register. If the operand is 0 and the register is %r0, then the instruction behaves as a no-op (NOP), which means that no operation takes place. Example usage: sethi #304F15, %r1 Meaning: Set the high 22 bits of %r1 to (304F15) 16, and set the low 10 bits to zero. Object code: Instruction: andcc Description: Bitwise AND the source operands into the destination operand. The condition codes are set according to the result. Example usage: andcc %r1, %r2, %r3 Meaning: Logically AND %r1 and %r2 and place the result in %r3. Object code: Instruction: orcc Description: Bitwise OR the source operands into the destination operand. The condition codes are set according to the result. Example usage: orcc %r1, 1, %r1 Meaning: Set the least significant bit of %r1 to 1. Object code: Instruction: orncc Description: Bitwise NOR the source operands into the destination operand. The condition codes are set according to the result. Example usage: orncc %r1, %r0, %r1 Meaning: Complement %r1. Object code: Instruction: srl Description: Shift a register to the right by 0 31 bits. The vacant bit positions in the left side of the shifted register are filled with 0 s.

20 122 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE Example usage: srl %r1, 3, %r2 Meaning: Shift %r1 right by three bits and store in %r2. Zeros are copied into the three most significant bits of %r2. Object code: Instruction: addcc Description: Add the source operands into the destination operand using two s complement arithmetic. The condition codes are set according to the result. Example usage: addcc %r1, 5, %r1 Meaning: Add 5 to %r1. Object code: Instruction: call Description: Call a routine and store the address of the current instruction (where the call itself is stored) in %r15, which effects a call and link operation. In the assembled code, the disp30 field in the CALL format will contain a 30-bit displacement from the address of the call instruction. The address of the next instruction to be executed is computed by adding 4 disp30 (which shifts disp30 to the high 30 bits of the 32-bit address) to the address of the current instruction. Note that disp30 can be negative. Example usage: call sub_r Meaning: Call a subroutine that begins at location sub_r. For the object code shown below, sub_r is 25 words (100 bytes) farther in memory than the call instruction. Object code: Instruction: jmpl Description: Jump and link (return from subroutine). Jump to a new address and store the address of the current instruction (where the jmpl instruction is located) in the destination register. Example usage: jmpl %r15 + 4, %r0 Meaning: Return from subroutine. The value of the PC for the call instruction was previously saved in %r15, and so the return address should be computed for the instruction that follows the call, at %r The current address is discarded in %r0. Object code: Instruction: be Description: If the z condition code is 1, then branch to the address computed by adding 4 disp22 in the Branch instruction format to the address of the current instruction. If the z condition code is 0, then control is transferred to the instruction that follows be. Example usage: be label Meaning: Branch to label if the z condition code is 1. For the object code shown below, label is five words (20 bytes) farther in memory than the be instruction. Object code:

21 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE 123 Instruction: bneg Description: If the n condition code is 1, then branch to the address computed by adding 4 disp22 in the Branch instruction format to the address of the current instruction. If the n condition code is 0, then control is transferred to the instruction that follows bneg. Example usage: bneg label Meaning: Branch to label if the n condition code is 1. For the object code shown below, label is five words farther in memory than the bneg instruction. Object code: Instruction: bcs Description: If the c condition code is 1, then branch to the address computed by adding 4 disp22 in the Branch instruction format to the address of the current instruction. If the c condition code is 0, then control is transferred to the instruction that follows bcs. Example usage: bcs label Meaning: Branch to label if the c condition code is 1. For the object code shown below, label is five words farther in memory than the bcs instruction. Object code: Instruction: bvs Description: If the v condition code is 1, then branch to the address computed by adding 4 disp22 in the Branch instruction format to the address of the current instruction. If the v condition code is 0, then control is transferred to the instruction that follows bvs. Example usage: bvs label Meaning: Branch to label if the v condition code is 1. For the object code shown below, label is five words farther in memory than the bvs instruction. Object code: Instruction: ba Description: Branch to the address computed by adding 4 disp22 in the Branch instruction format to the address of the current instruction. Example usage: ba label Meaning: Branch to label regardless of the settings of the condition codes. For the object code shown below, label is five words earlier in memory than the ba instruction. Object code: Pseudo-Ops In addition to the ARC instructions that are supported by the architecture, there are also pseudo-operations (pseudo-ops) that instruct the assembler to perform an operation at assembly time. A list of pseudo-ops and examples of their usages

22 124 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE are shown in Figure These pseudo-ops are not specific to the ARC, nor do Pseudo-Op Usage Meaning.equ X.equ #10 Treat symbol X as (10) 16.begin.begin Start assembling.end.end Stop assembling.org.org 2048 Change location counter to 2048.dwb.dwb 25 Reserve a block of 25 words.global.global Y Y is used in another module.extern.extern Z Z is defined in another module.macro.macro M a, b,... Define macro M with formal parameters a, b,....endmacro.endmacro End of macro definition.if.if <cond> Assemble if <cond> is true.endif.endif End of.if construct Figure 4-14 Pseudo-ops for the ARC assembly language. they appear in the official definition of the SPARC assembly language. In fact, they are generic to several assembly languages, although the names and specification of arguments may differ. There are any number of assembly languages that can be used to write SPARC programs, similar to the way that one computer can support a number of high level languages such as Pascal, C, and Fortran. The bit patterns for the instructions, however, are always interpreted in the same way. The.equ pseudo-op instructs the assembler to equate a value or a character string with a symbol, so that the symbol can be used throughout a program as if the value or string is written in its place. The.begin and.end pseudo-ops tell the assembler when to start and stop assembling. Any statements that appear before.begin or after.end are ignored. A single program may have more than one.begin/.end pair, but there must be a.end for every.begin, and there must be at least one.begin. The use of.begin and.end are helpful in making portions of the program invisible to the assembler during debugging. The.org (origin) pseudo-op changes the value of the location counter, and thereby forces the code that follows into the section of main memory that begins at the argument to.org (2048 in Figure 4-14). The.dwb (define word block) pseudo-op reserves a block of four-byte words, typically for an array. The location counter is moved ahead of the block according to the number of words specified by the argument to.dwb.

23 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE 125 The.global and.extern pseudo-ops deal with names of variables and addresses that are defined in one assembly code module and are used in another. The.global pseudo-op makes a label available for use in other modules. The.extern pseudo-op identifies a label that is used in the local module and is defined in another module (which should be marked with a.global in that module). We will see how.global and.extern are used when linking and loading are covered in the next chapter. The.macro,.endmacro,.if, and.endif pseudo-ops are also covered in the next chapter. 4.8 Assembly Language Programming The process of writing an assembly language program is similar to the process of writing a high-level program, except that many of the details that are abstracted away in high-level programs are made explicit in assembly language programs. Consider writing an ARC assembly language program that adds the numbers 15 and 9. One possible coding is shown in Figure The program begins and! This programs adds two numbers.begin.org 2048 prog1: ld [x], %r1! Load x into %r1 ld [y], %r2! Load y into %r2 addcc %r1, %r2, %r3! %r3 %r1 + %r2 st %r3, [z]! Store %r3 into z jmpl %r15 + 4, %r0! Return x: 15 y: 9 z: 0.end Figure 4-15 An ARC assembly language program. ends with a.begin/.end pair. The.org pseudo-op instructs the assembler to begin assembling so that the assembled code is loaded into memory starting at location The operands 15 and 9 are stored in variables x and y, respectively. We can only add numbers that are stored in registers in the ARC (because only ld and st can access main memory), and so the program begins by loading registers %r1 and %r2 with x and y. The addcc instruction adds %r1 and %r2 and places the result in %r3. The st instruction then stores %r3 in memory location z. The jmpl instruction with operands %r15 + 4, %r0 causes a return to the next instruction in the calling routine, which is the operating system if this is the highest level of a user s program as we can assume it is here. The variables x, y, and z follow the program.

24 126 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE In practice, the SPARC code equivalent to the ARC code shown in Figure 4-15 is not entirely correct. The ld, st, and jmpl instructions all take two instruction cycles to complete, and need to be followed by an instruction that does not rely on the operation completing in just one instruction cycle. We will cover this in more detail in Chapter 6. Now consider a more complex program that sums an array of integers. One possible coding is shown in Figure As in the previous example, the program! This program sums LENGTH numbers! Register usage: %r1 Length of array a! %r2 Starting address of array a! %r3 The partial sum! %r4 Pointer into array a! %r5 Holds an element of a.begin! Start assembling.org 2048! Start program at 2048 a_start.equ 3000! Address of array a ld [length], %r1! %r1 length of array a ld [address],%r2! %r2 address of a andcc %r3, %r0, %r3! %r3 0 loop: andcc %r1, %r1, %r0! Test # remaining elements be done! Finished when length=0 addcc %r1, -4, %r1! Decrement array length addcc %r1, %r2, %r4! Address of next element ld %r4, %r5! %r5 Memory[%r4] ba loop! Repeat loop. Notice that! addcc on the next line executes in the delayed slot addcc %r3, %r5, %r3! Sum new element into r3 done: jmpl %r15 + 4, %r0! Return to calling routine length: 20! 5 numbers (20 bytes) in a address: a_start.org a_start! Start of array a a: 25! length/4 values follow end! Stop assembling Figure 4-16 An ARC program sums five numbers. begins and ends with a.begin/.end pair. The.org pseudo-op instructs the assembler to begin assembling so that the assembled code is loaded into memory starting at location A pseudo-operand is created for the symbol a_start which is assigned a value of The program begins by loading the length of array a, which is given in bytes,

25 CHAPTER 4 MACHINE LANGUAGE AND ASSEMBLY LANGUAGE 127 into %r1. The program then loads the starting address of array a into %r2, and clears %r3 which will hold the partial sum. Register %r3 is cleared by ANDing it with %r0, which always holds the value 0. Register %r0 can be ANDed with any register for that matter, and the result will still be zero. The label loop begins a loop that adds successive elements of array a into the partial sum (%r3) on each iteration. The loop starts by checking if the number of remaining array elements to sum (%r1) is zero. It does this by ANDing %r1 with itself, which has the side effect of setting the condition codes. We are interested in the z flag, which will be set to 1 if %r1 = 0. The remaining flags (n, v, and c) are set accordingly. The value of z is tested by making use of the be instruction. If there are no remaining array elements to sum, then the program branches to done which returns to the calling routine (which might be the operating system, if this is the top level of a user program). If the loop is not exited after the test for %r1 = 0, then %r1 is decremented by the width of a word in bytes (4) by adding 4. The starting address of array a (which is stored in %r2) and the index into a (%r1) are added into %r4, which then points to a new element of a. The element pointed to by %r4 is then loaded into %r5, which is added into the partial sum (%r3). The top of the loop is then revisited as a result of the ba loop statement. The variable length is stored after the instructions. The five elements of array a are placed in an area of memory according to the argument to the.org pseudo-op (location 3000). 4.9 Subroutine Linkage and Stacks A subroutine (or a function) is a sequence of instructions that is invoked in a manner that makes it appear to be a single instruction in a high level view. When a program calls a subroutine, control is passed from the program to the subroutine, which executes a sequence of instructions and then returns to the calling routine. There are a number of methods, which are referred to as calling conventions, for passing arguments to and from the called routine. The process of passing arguments between routines is referred to as subroutine linkage. One calling convention simply places the arguments in registers. The code in Figure 4-17 shows a program that loads two arguments into %r1 and %r2, calls subroutine add_1, and then retrieves the result from %r3. Subroutine add_1 takes its operands from %r1 and %r2, and places the result in %r3 before returning via the jmpl instruction. This method is fast and simple, but it will not work if the number of arguments that are passed between the routines exceeds the

CPSC 352. Computer Organization. Chapter 5: Languages and the

CPSC 352. Computer Organization. Chapter 5: Languages and the 5-1 CPSC 352 Computer Organization Chapter 5: Languages and the Machine 5-2 Chapter Contents 5.1 The Compilation Process 5.2 The Assembly Process 5.3 Linking and Loading 5.4 Macros 5.5 Case Study: Extensions