DLXsim A Simulator for DLX

Size: px
Start display at page:

Download "DLXsim A Simulator for DLX"

Transcription

1 DLXsim A Simulator for DLX Larry B. Hostetler Brian Mirtich November 26, Introduction Our project involved writing a simulator (DLXsim) for the DLX instruction set (as described in Computer Architecture, A Quantitative Approach by Hennessy and Patterson). DLXsim is an interactive program that loads DLX assembly programs and simulates the operation of a DLX computer on those programs, allowing both single-stepping and continuous execution through the DLX code. DLXsim also provides the user with commands to set breakpoints, view and modify memory and registers, and print statistics on the execution of the program allowing the user to collect various information on the run-time properties of a program. We expect that a major use for this tool will be in association with future CS 252 classes to aid in the understanding of this instruction set. A complete overview of the interface provided by the simulator can be found in the user manual for DLXsim, which has been included after this section. Later in this paper, a few sample runs of the simulator will also be given. We decided that since the MIPS instruction set has many similarities with DLX, and a good MIPS simulator (available from Ousterhaut) already exists, it would be a better use of our time to modify that simulator to handle the DLX description. This simulator was built on top of the Tcl interface, providing a programming type environment for the user as well. The main problem we encountered when rewriting the simulator was that there are a couple of fundamental differences between the DLX and MIPS architectures. Following is a list of the main differences we identified between the two architectures. In MIPS, branch and jump offsets are stored as the number of words, where DLX stores the number of bytes. This has the effect of allowing jumps on MIPS to go four times as far. MIPS jumps have a non-obvious approach to determining the destination address: the bits in the offset part of the instruction simply replace the lower bits in the program counter. DLX chooses a more conventional approach in that the offset is sign extended, and then added to the program counter. In the MIPS architecture, conditional branches are based on the result of a comparison between any two registers. DLX has only two main conditional branch operations which branch on whether a register is zero or non-zero. DLX provides load interlocks, while the MIPS 2000 does not. MIPS 2000 provides instructions for unaligned accesses to memory, while DLX does not. The result of a MIPS multiply or divide ends up in two special registers (HI and LO) allowing 64 bit results; the result of a DLX multiply is placed in the chosen general purpose register, and must therefore fit into 32 bits. Because of the large number of similarities between DLX and MIPS, we based our opcodes on those used by the MIPS machine (where MIPS had equivalent instructions). Where DLX had instructions with no MIPS equivalent, we grouped such similar DLX instructions and assigned to them blocks of unused opcodes. 1

2 Below, you will find the opcode numbers used for the DLX instructions. Register-register instructions have the special opcode, and the instruction is specified in the lower six bits of the instruction word. Similarly, floating point instructions have the fparith opcode, and the actual instruction is again found in the lower six bits of the word. Main opcodes $00 $01 $02 $03 $04 $05 $06 $07 $00 SPECIAL FPARITH J JAL BEQZ BNEZ BFPT BFPF $08 ADDI ADDUI SUBI SUBUI ANDI ORI XORI LHI $10 RFE TRAP JR JALR $18 SEQI SNEI SLTI SGTI SLEI SGEI $20 LB LH LW LBU LHU LF LD $28 SB SH SW SF SD $30 SEQUI SNEUI SLTUI SGTUI SLEUI SGEUI Special opcodes (Main opcode = $00) $00 $01 $02 $03 $04 $05 $06 $07 $00 SLLI SRLI SRAI SLL SRL SRA $08 TRAP $10 SEQU SNEU SLTU SGTU SLEU SGEU $18 MULT MULTU DIV DIVU $20 ADD ADDU SUB SUBU AND OR XOR $28 SEQ SNE SLT SGT SLE SGE $30 MOVI2S MOVS2I MOVF MOVD MOVFP2I MOVI2FP Floating Point opcodes (Main opcode = $01) $00 $01 $02 $03 $04 $05 $06 $07 $00 ADDF SUBF MULTF DIVF ADDD SUBD MULTD DIVD $08 CVTF2D CVTF2I CVTD2F CVTD2I CVTI2F CVTI2D $10 EQF NEF LTF GTF LEF GEF $18 EQD NED LTD GTD LED GED The manual entry for DLXsim follows. 2

3 DLXSIM User Commands Page 3 NAME DLXsim - Simulator and debugger for DLX assembly programs SYNOPSIS dlxsim OPTIONS [-al#] [-au#] [-dl#] [-du#] [-ml#] [-mu#] -al# -au# -dl# -du# -ml# -mu# Select the latency for a floating point add (in clocks). Select the number of floating point add units. Select the latency for a floating point divide. Select the number of floating point divide units. Select the latency for a floating point multiply. Select the number of floating point multiply units. DESCRIPTION DLXsim is an interactive program that loads DLX assembly programs and simulates the operation of a DLX computer on those programs. When DLXsim starts up, it looks for a file named.dlxsim in the user s home directory. If such a file exists, DLXsim reads it and processes it as a command file. DLXsim also checks for a.dlxsim file in the current directory, and executes the commands in it if the file exists. Finally, DLXsim loops forever reading commands from standard input and printing results on standard output. NUMBERS Whenever DLXsim reads a number, it will accept the number in either decimal notation, hexadecimal notation if the first two characters of the number are 0x (e.g. 0x3acf), or octal notation if the first character is 0 (e.g. 0342). Two DLXsim commands accept only floating pointer numbers from the user; these are fget and fput and will be described later. ADDRESS EXPRESSIONS Many of DLXsim s commands take as input an expression identifying a register or memory location. Such values are indicated with the term address in the command descriptions below. Where register names are acceptable, any of the names r0 through r31 and f0 through f31 may be used. The names $0 through $31 may also be used (instead of r0 through r31), but the dollar signs are likely to cause confusion with Tcl variables, so it is safer to use r instead of $. The name pc may be used to refer to the program counter. Symbolic expressions may be used to specify memory addresses. The simplest form of such an expression is a number, which is interpreted as a memory address. More generally, address expressions may consist of numbers, symbols (which must be defined in the assembly files currently loaded), the operators, /, %, +,, <<, >>, &,, and (which have the same meanings and precedences as in C), and parentheses for grouping. COMMANDS In addition to all of the built-in Tcl commands, DLXsim provides the following application-specific commands: asm instruction [address] Treats instruction as an assembly instruction and returns a hexadecimal value equivalent

4 DLXSIM User Commands Page 4 to instruction. Some instructions, such as relative branches, will be assembled differently depending on where in memory the instruction will be stored. The address argument may be used to indicate where the instruction would be stored; if omitted, it defaults to 0. fget address [flags] Return the values of one or more memory locations or registers. Address identifies a memory location or register, and flags, if present, consists of a number and/or set of letters, all concatenated together. If the number is present, it indicates how many consecutive values to print (the default is 1). If flag characters are present, they have the following interpretation: d f Print values as double precision floating point numbers. Print values as single precision floating point numbers (default). fput address number [precision] Store number in the register or memory location given by address. If precision is d, the number is stored as a double precision floating point number (in two words). If precision is f or no precision is given, the number is stored as a single precision floating point number. get address [flags] Similar to fget above, this command is for all types except floating point. If flag characters are present, they have the following interpretation: B b c d h i s v w x Print values in binary. When printing memory locations, treat each byte as a separate value. Print values as ASCII characters. Print values in decimal. When printing memory locations, treat each halfword as a separate value. Print values as instructions in the DLX assembly language. Print values as null-terminated ASCII strings. Instead of printing the value of the memory location referred to by address, print the address itself as the value. When printing memory locations, treat each word as a separate value. Print values in hexadecimal (default). To interpret numbers as single or double precision floating point, use the fget command. go [address] Start simulating the DLX machine. If address is given, execution starts at that memory address. Otherwise, it continues from wherever it left off previously. This command does not complete until simulated execution stops. The return value is an information string about why execution stopped and the current state of the machine. load file file file... Read each of the given files. Treat them as DLX assembly language files and load memory as indicated in the files. Code (text) is normally loaded starting at address 0x100, but the codestart variable may be used to set a different starting address. Data is normally loaded starting at address 0x1000, but a different starting address may be specified in

5 DLXSIM User Commands Page 5 the datastart variable. The return value is either an empty string or an error message describing problems in reading the files. A list of directives that the loader understands is in a later section of this manual. put address number Store number in the register or memory location given by address. The return value is an empty string. To store floating point numbers (single or double precision), use the fput command. quit Exit the simulator. stats [reset] [stalls] [opcount] [pending] [branch] [hw] [all] This command will dump various statistics collected by the simulator on the DLX code that has been run so far. Any combination of options may be selected. The options and their results are as follows: reset stalls opcount pending branch hw all Reset all of the statistics. Show the number of load stalls and stalls while waiting for a floating point unit to become available or for the result of a previous operation to become available. Show the number of each operation that has been executed. Show all floating point operations currently being handled by the floating point units as well as what their results will be and where they will be stored. Show the percentage of branches taken and not-taken. Show the current hardware setup for the simulated machine. Equivalent to choosing all options except reset. This is the default. step [address] If no address is given, the step command executes a single instruction, continuing from wherever execution previously stopped. If address is given, then the program counter is changed to point to address, and a single instruction is executed from there. In either case, the return value is an information string about the state of the machine after the single instruction has been executed. stop [option args] This command may take any of the forms described below: stop Arrange for execution of DLX code to stop as soon as possible. If a simulation isn t in progress then this command has no effect. This command is most often used in the command argument for the stop at command. Returns an empty string. stop at address [command] Arrange for command (a DLXsim command string) to be executed whenever the memory address identified by address is read, written, or executed. If command is not given, it defaults to stop, so that execution stops whenever address is accessed. A stop applies to the entire word containing address: the stop will be triggered whenever any byte of the word is accessed. Stops are not processed during the step commands or the first instruction executed in a go command. Returns an empty string. stop info

6 DLXSIM User Commands Page 6 Return information about all stops currently set. stop delete number number number... Delete each of the stops identified by the number arguments. Each number should be an identifying number for a stop, as printed by stop info. Returns an empty string. ASSEMBLY FILE FORMAT The assembler built into DLXsim, invoked using the load command, accepts standard format DLX assembly language programs. The file is expected to contain lines of the following form: Labels are defined by a group of non-blank characters starting with either a letter, an underscore, or a dollar sign, and followed immediately by a colon. They are associated with the next address to which code in the file will be stored. Labels can be accessed anywhere else within that file, and in files loaded after that if the label is declared as.global (see below). Comments are started with a semicolon, and continue to the end of the line. Constants can be entered either with or without a preceding number sign. The format of instructions and their operands are as shown in the Computer Architecture book. While the assembler is processing an assembly file, the data and instructions it assembles are placed in memory based on either a text (code) or data pointer. Which pointer is used is selected not by the type of information, but by whether the most recent directive was.data or.text. The program initially loads into the text segment. The assembler supports several directives which affect how it loads the DLX s memory. These should be entered in the place where you would normally place the instruction and its arguments. The directives currently supported by DLXsim are:.align n Cause the next data/code loaded to be at the next higher address with the lower n bits zeroed (the next closest address greater than or equal to the current address that is a multiple of 2 n 1 )..ascii string1, string2,... Store the strings listed on the line in memory as a list of characters. The strings are not terminated by a 0 byte..asciiz string1, string2,... Similar to.ascii, except each string is followed by a 0 byte (like C strings)..byte byte1, byte2,... Store the bytes listed on the line sequentially in memory..data [address] Cause the following code and data to be stored in the data area. If an address was supplied, the data will be loaded starting at that address, otherwise, the last value for the data pointer will be used. If we were just reading code based on the text (code) pointer, store that address so that we can continue from there later (on a.text directive)..double number1, number2,... Store the numbers listed on the line sequentially in memory as double precision floating point numbers..float number1, number2,... Store the numbers listed on the line sequentially in memory as single precision floating point numbers.

7 DLXSIM User Commands Page 7.global label Make the label available for reference by code found in files loaded after this file..space size Move the current storage pointer forward size bytes (to leave some empty space in memory)..text [address] Cause the following code and data to be stored in the text (code) area. If an address was supplied, the data will be loaded starting at that address, otherwise, the last value for the text pointer will be used. If we were just reading data based on the data pointer, store that address so that we can continue from there later (on a.data directive)..word word1, word2,... Store the words listed on the line sequentially in memory. VARIABLES DLXsim uses or sets the following Tcl variables: codestart If this variable exists, it indicates where to start loading code in load commands. datastart If this variable exists, it indicates where to start loading data in load commands. inscount DLXsim uses this variable to keep a running count of the total number of instructions that have been simulated so far. prompt If this variable exists, it should contain a DLXsim command string. DLXsim will execute the command in this string before printing each prompt, and use the result as the prompt string to print. If this variable doesn t exist, or if an error occurs in executing its contents, then the prompt (dlxsim) is used. SEE ALSO Computer Architecture, A Quantitative Approach, by John L. Hennessy and David A. Patterson. KEYWORDS DLX, debug, simulate

8 2 Interactive Sessions with DLXsim To illustrate some of the features of DLXsim, this section describes two interactive sessions using examples taken from Chapter 6 of Computer Architecture, A Quantitative Approach by Hennessy and Patterson. The programs used are on page 315 and 317. The ADDD instructions have been replaced with MULTD instructions, however, to show the effects of a slightly longer latency. Also, TRAP instructions have been added to terminate execution of the programs when simulating. 2.1 Sample Datafile The examples which follow operate on arrays of numbers. A common datafile is used for input to the programs. This datafile is named fdata.s and is shown below:.data 0.global a a:.double global x x:.double 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16.double 17,18,19,20,21,22,23,24,25,26,27.global xtop xtop:.double 28 The.data directive specifies that the data should be loaded in at location 0. The.global directive add the specified labels to a global symbol table so that other assembly files can access them. The.double directive stores double precision data to memory. 2.2 First Example The first example uses the program at the bottom of page 315 (with the ADDD replaced by MULTD). The program is shown below. ld f2,a add r1,r0,xtop loop: ld f0,0(r1) ; load stall occurs here multd f4,f0,f2 ; 4 FP stalls sd 0(r1),f4 sub r1,r1,#8 bnez r1,loop nop ; branch delay slot trap #0 ; terminate simulation The simulator is invoked by typing dlxsim at the system prompt. % dlxsim First the datafile is loaded, using the load command: (dlxsim) load fdata.s Next, the program may be loaded. The program above was created with an editor and saved in the file f1.s. It is loaded in the same way as the datafile. 8

9 (dlxsim) load f1.s To verify that the program has been loaded, the get command can be used to examine memory. The program is loaded at location 256 by default. The second parameter to get indicates how many words to dump. The i suffix tells get to dump the contents in instruction format (i.e. produce a disassembly). (dlxsim) get 256 9i start: ld f2,a(r0) start+0x4: addi r1,r0,0xe0 loop: ld f0,a(r1) loop+0x4: multd f4,f0,f2 loop+0x8: sd a(r1),f4 loop+0xc: subi r1,r1,0x8 loop+0x10: bnez r1,loop loop+0x14: nop loop+0x18: trap 0x0 To make sure that the statistics are all cleared (as they should be when DLXsim is first invoked), use the stats command with the relevant parameters: (dlxsim) stats stalls branch pending hw Memory size: bytes. Floating Point Hardware Configuration 1 add/subtract units, latency = 2 cycles 1 divide units, latency = 19 cycles 1 multiply units, latency = 5 cycles Load Stalls = 0 Floating Point Stalls = 0 No branch instructions executed. Pending Floating Point Operations: none. The hw specifier causes the memory size and floating point hardware information to be dumped. The stalls specifier causes the total load stalls and floating point stalls to be displayed. The branch specifier causes the branch information (taken vs. not taken) to be displayed; in this case no branches have been executed yet. Finally, the pending specifier causes the pending operations in the floating point units to be displayed (none in this case). Below, the first four instructions are executed using the step command: (dlxsim) step 256 stopped after single step, pc = start+0x4: addi r1,r0,0xe0 (dlxsim) step stopped after single step, pc = loop: ld f0,a(r1) (dlxsim) step stopped after single step, pc = loop+0x4: multd f4,f0,f2 (dlxsim) step 9

10 stopped after single step, pc = loop+0x8: sd a(r1),f4 The stats command can produce some more interesting results at this point. (dlxsim) stats stalls pending Load Stalls = 1 Floating Point Stalls = 0 Pending Floating Point Operations: multiplier #1 : will complete in 4 more cycle(s) ==> F4:F5 A load stall occurred between the third and fourth instructions because of the F0 dependency. The multiply instruction has issued, and is being processed in multiplier unit #1. It will complete and store the double precision value into F4 and F5 in four more clock cycles. The double precision value in F4 can be displayed using the fget command with a d specifier (for double precision). (dlxsim) fget f4 d f4: As expected, F4 hasn t received its value yet. Executing one more instruction will change the statistics: (dlxsim) step stopped after single step, pc = loop+0xc: subi r1,r1,0x8 (dlxsim) stats stalls pending Load Stalls = 1 Floating Point Stalls = 4 Pending Floating Point Operations: none. Since the SD instruction used the result from the multiply instruction, the multiply was completed before the SD was executed. The four floating point stalls required for the multiply to complete were recorded as well. If F4 is examined now, its value after the writeback is displayed. (dlxsim) fget f4 d f4: To execute the program to completion, the go command can be used. When the TRAP instruction is detected, the simulation will stop. (dlxsim) go TRAP #0 received To view the cumulative stall and branch information, the stats command can be used. (dlxsim) stats stalls branch Load Stalls = 28 Floating Point Stalls = 112 Branches: total 28, taken 27 (96.43%), untaken 1 (3.57%) 10

11 The loop executed 28 times. There was a single load stall per iteration, for a total of 28 load stalls. There were 4 floating point stalls per iteration, for a total of 112 floating point stalls. Finally, the conditional branch at the bottom of the loop was taken 27 times, and fell through on the final time. All these statistics are reflected above. To verify the program operated properly, the memory locations containing the original data can be examined with the fget command. The original data was stored in the 28 double words beginning at location 8. (dlxsim) fget 8 28d x: x+0x8: x+0x10: etc.... x+0xc8: x+0xd0: xtop: As expected, the initial integer values have all been multiplied by π. 2.3 Second Example The second example is from page 317 of the aforementioned text. It demonstrates the effects of unrolling loops when multiple execution units are available. The program, which is shown below, performs the same operations on the list of numbers as the previous example program. start: ld f2,a add r1,r0,xtop loop: ld f0,0(r1) ld f6,-8(r1) ld f10,-16(r1) ld f14,-24(r1) multd f4,f0,f2 multd f8,f6,f2 multd f12,f10,f2 multd f16,f14,f2 ; FP stall here sd 0(r1),f4 sd -8(r1),f8 sd -16(r1),f12 sub r1,r1,#32 bnez r1,loop sd 8(r1),f16 ; branch delay slot trap #0 To take full advantage of this unwound loop, DLXsim can be invoked with a command line argument specifying 4 floating point multiply units should be included in the hardware configuration. % dlxsim -mu4 (dlxsim) stats hw Memory size: bytes. 11

12 Floating Point Hardware Configuration 1 add/subtract units, latency = 2 cycles 1 divide units, latency = 19 cycles 4 multiply units, latency = 5 cycles After loading the data and program files, the step instruction can be used to execute the first 10 instructions. At this point, the last MULTD instruction has just issued. The stats command can display the stalls and pending operations. (dlxsim) stats stalls pending Load Stalls = 0 Floating Point Stalls = 0 Pending Floating Point Operations: multiplier #0 : will complete in 1 more cycle(s) ==> F4:F5 multiplier #1 : will complete in 2 more cycle(s) ==> F8:F9 multiplier #2 : will complete in 3 more cycle(s) ==> F12:F13 multiplier #3 : will complete in 4 more cycle(s) ==> F16:F17 It is intersting to see what happens after the next instruction is executed. (dlxsim) step stopped after single step, pc = loop+0x24: sd 0xfff8(r1),f8 (dlxsim) stats stalls pending Load Stalls = 0 Floating Point Stalls = 1 Pending Floating Point Operations: multiplier #2 : will complete in 1 more cycle(s) ==> F12:F13 multiplier #3 : will complete in 2 more cycle(s) ==> F16:F17 Since the SD instruction was dependent on the first MULTD instruction, a floating point stall occurred so the MULTD could complete. This added stall cycle also caused the second MULTD to also complete. The MULTDs have caught up with the SDs, and no more stalls will occur on this iteration. This is the reason loop unrolling works. To run the program to completion, the go command can be used. (dlxsim) go TRAP #0 received To dump all the statistics gathered, the stats command is used without any parameters. (dlxsim) stats Memory size: bytes. Floating Point Hardware Configuration 1 add/subtract units, latency = 2 cycles 1 divide units, latency = 19 cycles 4 multiply units, latency = 5 cycles Load Stalls = 0 Floating Point Stalls = 7 12

13 Branches: total 7, taken 6 (85.71%), untaken 1 (14.29%) Pending Floating Point Operations: none. INTEGER OPERATIONS ================== ADD 0 ADDI 1 ADDU 0 ADDUI 0 AND 0 ANDI 0 BEQZ 0 BFPF 0 BFPT 0 BNEZ 7 DIV 0 DIVU 0 J 0 JAL 0 JALR 0 JR 0 LB 0 LBU 0 LD 29 LF 0 LH 0 LHI 0 LHU 0 LW 0 MOVD 0 MOVF 0 MOVFP2I 0 MOVI2FP 0 MOVI2S 0 MOVS2I 0 MULT 0 MULTU 0 OR 0 ORI 0 RFE 1 SB 0 SD 28 SEQ 0 SEQI 0 SF 0 SGE 0 SGEI 0 SGT 0 SGTI 0 SH 0 SLE 0 SLEI 0 SLL 0 SLLI/NOP 0 SLT 0 SLTI 0 SNE 0 SNEI 0 SRA 0 SRAI 0 SRL 0 SRLI 0 SUB 0 SUBI 7 SUBU 0 SUBUI 0 SW 0 TRAP 1 XOR 0 XORI 0 Total integer operations = 74 FLOATING POINT OPERATIONS ========================= ADDD 0 ADDF 0 CVTD2F 0 CVTD2I 0 CVTF2D 0 CVTF2I 0 CVTI2D 0 CVTI2F 0 DIVD 0 DIVF 0 EQD 0 EQF 0 GED 0 GEF 0 GTD 0 GTF 0 LED 0 LEF 0 LTD 0 LTF 0 MULTD 28 MULTF 0 NED 0 NEF 0 SUBD 0 SUBF 0 Total floating point operations = 28 Total operations = 102 Total cycles = 109 The dynamic counts for all instructions are shown, as well as the statistics previously discussed. The number of load stalls is seven in this case, compared to 28 in the first example. This is the result of unrolling the loop four times and providing four multiply units in hardware. An estimate of the clocks per instruction (CPI) can be obtained by dividing the total cycles (109) by the total operations (102). The two examples above give only a flavor for the types of operations which may be done in DLXsim. The possibilities are endless. 13

14 3 Internal Operation Some information concerning how DLXsim operates internally may be useful to some users, particularly those who wish to modify or enhance the simulator. This section provides an overview of the simulator and a discussion of the underlying data structures used. This information is not necessary to use DLXsim. All of the code discussed below is contained in the file sim.c. 3.1 Instruction Tables DLXsim contains four tables which contain information about the DLX instruction set. The first is optable. This table contains 64 entries corresponding to the 64 possible values of the opcode field. Each entry consists of an instruction-format pair. For example, the value of optable[5] is {OP BNEZ, IFMT} indicating that opcode 5 is a branch not equal to zero instruction, which uses the I-type format. Several entries in this table have OP RES as the instruction. These entries are reserved for future extensions to the DLX instruction set. The zero opcode indicates a different table should be used to identify the instruction. A second table called specialtable handles this case. In this table are all the register-register operations. The format is not specified explicitly for these instructions (as it was in optable) because they are all R-type format. These instructions all contain a zero in the opcode field and a function encoding in the lower six bits of the instruction word. There is also room in this table for expansion by using entries currently containing OP RES. An opcode of one indicates a floating point arithmetic operation. A third table, FParithTable handles these instructions. As with specialtable, all instructions in this table have R-type format. The exact operation is again specified by the lower six bits of the instruction word, which are used to index into this table. Currently 32 entries contain OP RES and are available for future expansion to the floating point instruction set. The final table is operationnames. This table contains a list of all the integer instruction names followed by the floating point instruction names. Each group is arranged alphabetically. These tables are used to print out the names of the instructions when a dynamic instruction count is requested. 3.2 Simulator Support Functions This subsection describes the various routines which handle simulator commands and provide support for the main simulator code. The function Sim Create initializes a DLX processor structure and is invoked when DLXsim is first started. The memory size of the machine along with the floating point hardware specification (i.e. unit quantities and latencies) are specified as parameters. Two functions, statsreset and Sim DumpStats, process the stats command in DLXsim. The former resets all the statistics to zero, and the latter processes requests for various statistics. The statistics currently taken during simulation are: load stalls, floating point stalls, dynamic instruction counts, and conditional branch behavior. In addition, the floating point hardware and pending floating point operations can also be examined. See the description of the stats command for more information on how to request and reset the various statistics. The functions Sim GoCmd and Sim StepCmd process the simulator s go and step commands, respectively. See the description of these commands for more information on using them. The functions ReadMem and WriteMem provide the interface between the simulator and the DLX memory structure. They insure that the address accessed is valid, which means in must be within the memory s range and it must be on a word boundary. Otherwise, appropriate error handling occurs. 3.3 Compilation of Instructions To improve efficiency, DLXsim compiles the instructions as it first encounters them. To understand how this works, it is necessary to examine the structure of a single word of the DLX memory. A single memory word contains several fields: value, opcode, rs1, rs2, rd, and extra. A DLX program to be simulated is written in DLX assembly language. Such a program is automatically assembled into machine code as it is loaded. 14

15 The actual machine codes are stored in the value fields of the memory words. The value field represents the number actually stored at a particular memory word. The opcode field of each memory word is initially set to the special value OP NOT COMPILED. When the simulator executes an instruction, it first examines the opcode field of the memory word pointed to by the program counter. If this field is a valid opcode (specified in the tables discussed above), the appropriate action for that instruction occurs. If the opcode field contains the value OP NOT COMPILED, the function Compile is invoked. This function looks at the actual word stored in the value field. The bits corresponding to the opcode and function fields are examined to determine what the instruction is. Depending on the instruction type, the two source register specifiers and destination register specifier may be extracted and stored in the fields rs1, rs2, and rd. If a 16-bit immediate value is present (for I-type instructions) or a 26-bit offset is present (for J-type instructions), this value is extracted and stored in the extra field of the memory word. The special code for the instruction is stored in the opcode field of the word, which previously contained the value OP NOT COMPILED. These special codes are not the real DLX opcodes, but rather the pseudo-opcodes defined in the file dlx.h. When a compiled instruction is subsequently encountered, no shifting or masking operations are required to access the register specifiers or immediate values; the required information is already present in the appropriate fields of the memory words (rs1, rs2, rd, and extra). This allows the simulator to execute much faster. The actual machine code for the instruction can still be examined through the value field, and this is the value printed when the word is examined with the get command. 3.4 Main Simulation Loop Simulate is the main function of the simulator. The heart of this function is basically a very large switch statement, based on the opcode field of the memory word pointed to by the program counter. There is a case for each integer and floating point instruction. Simulate loops through the basic fetch-decode-execute cycle until a stop command is received or some other exceptional condition occurs Load Stalls DLX has a latency of one cycle on load instructions. In other words, the result is not yet present in the destination register on the cycle immediately following the load instruction. To address this problem, DLX has load interlocks which cause the pipeline to stall if an instruction immediately following a load instruction reads the value in the load s destination register. DLXsim records the occurance of these load stall cycles for statistical purposes. Several variables are set during the processing of the following load instructions: LB, LBU, LH, LHU LW, LF, and LD. LHI is not included since the value to be loaded is contained in the instruction and there is no extra latency. For the other load instructions, the destination register (or registers in the case of load double) are stored in loadreg1 and loadreg2 (if this is a load double). The corresponding values to be stored in these registers (on the next cycle) are stored in loadvalue1 and loadvalue2. When an instruction that reads registers (such as an ADD instruction) is encountered during simulation, the contents of loadreg1 and loadreg2 are examined before any other action occurs. If either of the registers specified by these variables were loaded in the previous instruction, a load stall is detected and tallied. Different register fields must be checked for different instructions. All the load stall detection logic is contained in the macros at the top of the Simulate function definition. Of interest is the fact that while load stalls would slow down the execution speed of a real DLX machine, they do not affect the performance of the simulator. This is because load stall cycles are not actually simulated. Instead, it is simply noted that a load stall occurred at a particular point, and execution proceeds normally Dynamic Instruction Counts Statistics on the number of each type of instruction executed are also recorded during simulation. This is a simple operation of incrementing the appropriate element of the array operationcount, which is indexed by the pseudo-opcodes discussed above. The information in the array can be accessed by the stats command. 15

16 3.4.3 Conditional Branch Behavior DLXsim also keeps statistics on the conditional branch behavior during program execution. There are four instructions in this category: BEQZ, BNEZ, BFPT, and BFPF. The latter two instructions are branches based on the status of the floating point condition register. Two fields of the DLX machine structure, branchyes and branchno record how many conditional branches where taken and not taken, respectively. These values are accessible via the stats command. 3.5 Floating Point Execution Control A large portion of the DLXsim code is devoted to the floating point side of the machine. The floating point scheme currently implemented requires instructions to issue in order, but they may complete out of order. In addition to managing the allocation of the floating point units, DLXsim must also handle all the hazard checking associated with out of order completion of instructions. By requiring instructions to issue in order, the write-after-read (WAR) hazard is avoided. The three hazards which may occur are read-after-write (RAW) hazards, write-after-write (WAW) hazards, and structural hazards Floating Point Data Structures The variables and data structures which manage the floating point execution are all declared in the file dlx.h as part of the basic DLX structure. The variables num add units, num div units, and num mul units specify how many of each type of floating point execution unit are available on the machine. The variables fp add latency, fp div latency, and fp mul latency specify the corresponding latencies (in clock cycles) of each of the execution units. All six of these variables have default values which may be overridden via command line parameters when DLXsim is invoked. The variable FPstatusReg is the status register which is examined on a BFPT or BFPF instruction. The various floating point set instructions (EQF, NED, etc.) write to this register. The array fp add units contains the status of all the floating point adders during execution. If fp add units[i] is zero, adder i is available. A non-zero value means that the unit is currently performing an operation the value specifies the clock cycle when the operation will complete. The array fp div units and fp mul units contain analogous information for the floating point dividers and multipliers. All three structures can be accessed through the array fp units which is an array of pointers to the three execution unit status arrays. The array waiting FPRs contains 32 elements, corresponding to the 32 floating point registers in DLX. A zero in waiting FPRs[i] means floating point register Fi can be read from; it contains its most current value. A non-zero value means register Fi is the destination register of a pending floating point operation (one which has issued but not yet completed). Attempting to read or write to such a register means a hazard condition exists. The non-zero value indicates the cycle at which the writeback to the register will occur. The variable FPopsList points to the chain of pending floating point operations. Each item in this chain is of type FPop, a structure with the following fields: type unit Indicates the type of operation. Normally this is implied by what type of floating point unit is executing the operation, however adders can perform both additions and subtractions. The unit number of the execution unit which is executing the operation. dest The destination register for the operation. For a double precision operation, this is the lowernumbered destination register. isdouble result ready Indicates if the operation is single or double precision. An array of two floats used to store the result of the operation (only the first element is used for single precision operations). The result is actually computed at the time of issue. The cycle when the operation will complete and writeback will occur. 16

17 nextptr Points to the next FPop in the chain of pending operations. To maximize performance, the list of pending floating point operations is sorted based on when the operations will complete. The operation which will complete soonest is at the head of the list. The variable checkfp is a copy of the ready field of the first floating point operation on the pending operation list. If its value is zero, no floating point operations are pending. Otherwise checkfp indicates when the next (soonest) floating point operation will complete. This provides for very quick checking in the fast-path of the simulator. Only one value needs to be checked in a cycle when no writebacks should occur. Many of the previously discussed structures refer to a clock cycle count when a particular operation will complete. The current clock cycle is kept in the variable cyclecount. This variable is incremented each time the simulator executes its main loop. It is also incremented an extra time when a load stall is detected since the floating point units are still executing during a load stall. When the cycle count reaches a large value specified by the constant CYC CNT RESET, cyclecount is reset back to a small number (5), and all references to clock cycles in the floating point data structures are adjusted accordingly. This operation is necessary to prevent cyclecount from overflowing, becoming negative, and thereby wreaking havoc on the sorted list of pending operations. Making cyclecount an unsigned integer does not work, since there are still problems with sorting the pending operations when cycle counts wrap around to zero Issuing Floating Point Operations The function FPissue initiates a floating point operation. It is called from eight of the switch cases in the main loop: ADDF, DIVF, MULF, SUBF, ADDD, DIVD, MULD, and SUBD. When a floating point instruction issues, three hazard conditions must be checked. A structural hazard occurs if a floating point unit of the required type is not available. A RAW hazard occurs if one of the source operands is the destination of a pending floating point operation. Finally, a WAW hazard occurs if the destination register is the destination register of a pending floating point operation. All three conditions can be checked by examing the floating point data structures discussed above. If any of these hazards are present (and there may be more than one), the current instruction is not issued. Instead a non-zero value is returned which indicates the soonest cycle when one of the hazard conditions will be over. This may be a cycle when one of the floating point units will complete its current operation (eliminating a structural hazard), or when some register will be written back (eliminating a RAW or WAW hazard). When the caller receives a non-zero value from FPissue, the appropriate number of floating point stalls are simulated by adjusting the variables cyclecount and FPstalls. The function FPwriteBack (see below) is called to perform any writebacks which may now occur. Then FPissue is re-invoked. If another hazard condition exists, the whole process may be repeated, but eventually all of the hazard conditions will terminate. If no hazards are present, the instruction is issued. That is, an new FPop structure is placed in the appropriate spot in the pending operations list. The appropriate elements of waiting FPRs are also set to indicate that the destination registers are waiting for values to be written back. FPissue returns a zero value to indicate a successful issue, and the simulation continues Writing Back Floating Point Results The function FPwriteBack is the second function involved in floating point execution. It is called whenever cyclecount reaches checkfp, indicating that a result is ready to be written back on the current cycle. FPwriteBack does exactly that. It removes the first FPop from the list of pending operations, and stores the result (computed at time of issue) in the appropriate register(s). It also zeroes the appropriate element(s) in waiting FPRs. Since more than one operation may complete on the same cycle, FPwriteBack repeats this process until the value in the ready field of the operation at the head of the list exceeds the current value in cyclecount Handling RAW and WAW Hazards The function FPissue (discussed above) handles the RAW and WAW hazards when a new floating point operation is issued. However, several other instructions can generate such hazards. Any instruction which 17

18 reads from or writes to a floating point register must check that the register is not the destination of a pending operation. The following instructions fall into this class: Loads Stores Moves Converts Sets LF and LD. SF and SD. MOVFP2I, MOVI2FP, MOVF, MOVD. CVTD2FP, CVTD2I, CVTFP2D, CVTFP2I, CVTI2D, CVTI2FP. SEQF, SNEF, SLTF, SLEF, SGTF, SGEF, SEQD, SNED, SLTD, SLED, SGTD, SGED. When any of these instruction are executed, a call to FPwait is made. This is the third and final function for handling floating point execution. It checks that all writebacks into the appropriate registers have occurred. The number of registers which need to be checked varies. For a LF instruction, only a single register needs to be checked, while four registers must be checked on a MOVD. If any of the registers are the destinations of pending operations, FPwait will adjust cyclecount and FPstalls appropriately, and call FPwriteBack to write the results back to the registers. When FPwait returns, all RAW and WAW hazard conditions will have passed. 18

A Model RISC Processor. DLX Architecture

A Model RISC Processor. DLX Architecture DLX Architecture A Model RISC Processor 1 General Features Flat memory model with 32-bit address Data types Integers (32-bit) Floating Point Single precision (32-bit) Double precision (64 bits) Register-register

More information

CHAPTER 2: INSTRUCTION SET PRINCIPLES. Prepared by Mdm Rohaya binti Abu Hassan

CHAPTER 2: INSTRUCTION SET PRINCIPLES. Prepared by Mdm Rohaya binti Abu Hassan CHAPTER 2: INSTRUCTION SET PRINCIPLES Prepared by Mdm Rohaya binti Abu Hassan Chapter 2: Instruction Set Principles Instruction Set Architecture Classification of ISA/Types of machine Primary advantages

More information

DLXsim is an interactive program that loads DLX assembly programs and simulates the operation

DLXsim is an interactive program that loads DLX assembly programs and simulates the operation 1 NAME DLXsim - Simulator and debugger for DLX assembly programs SYNOPSIS dlxsim OPTIONS ë-alèë ë-auèë ë-dlèë ë-duèë ë-mlèë ë-muèë -alè -auè -dlè -duè -mlè -muè Select the latency for a oating point add

More information

SpartanMC. SpartanMC. Instruction Set Architecture

SpartanMC. SpartanMC. Instruction Set Architecture SpartanMC Instruction Set Architecture Table of Contents 1. Instruction Types... 1 1.1. R-Type...1 1.2. I-Type... 2 1.3. M-Type... 2 1.4. J-Type...2 2. Instruction Coding Matrices...3 3. Register Window...

More information

DLX: A Simplified RISC Model

DLX: A Simplified RISC Model DLX: A Simplified RISC Model 1 DLX Pipeline Fetch Decode Integer ALU Data Memory Access Write Back Memory Floating Point Unit (FPU) Data Memory IF ID EX MEM WB definition based on MIPS 2000 commercial

More information

DLX: A Simplified RISC Model

DLX: A Simplified RISC Model 1 DLX Pipeline DLX: A Simplified RISC Model Integer ALU Floating Point Unit (FPU) definition based on MIPS 2000 commercial microprocessor 32 bit machine address, integer, register width, instruction length

More information

R-type Instructions. Experiment Introduction. 4.2 Instruction Set Architecture Types of Instructions

R-type Instructions. Experiment Introduction. 4.2 Instruction Set Architecture Types of Instructions Experiment 4 R-type Instructions 4.1 Introduction This part is dedicated to the design of a processor based on a simplified version of the DLX architecture. The DLX is a RISC processor architecture designed

More information

TSK3000A - Generic Instructions

TSK3000A - Generic Instructions TSK3000A - Generic Instructions Frozen Content Modified by Admin on Sep 13, 2017 Using the core set of assembly language instructions for the TSK3000A as building blocks, a number of generic instructions

More information

Instruction Set Architecture of. MIPS Processor. MIPS Processor. MIPS Registers (continued) MIPS Registers

Instruction Set Architecture of. MIPS Processor. MIPS Processor. MIPS Registers (continued) MIPS Registers CSE 675.02: Introduction to Computer Architecture MIPS Processor Memory Instruction Set Architecture of MIPS Processor CPU Arithmetic Logic unit Registers $0 $31 Multiply divide Coprocessor 1 (FPU) Registers

More information

Design for a simplified DLX (SDLX) processor Rajat Moona

Design for a simplified DLX (SDLX) processor Rajat Moona Design for a simplified DLX (SDLX) processor Rajat Moona moona@iitk.ac.in In this handout we shall see the design of a simplified DLX (SDLX) processor. We shall assume that the readers are familiar with

More information

EEM 486: Computer Architecture. Lecture 2. MIPS Instruction Set Architecture

EEM 486: Computer Architecture. Lecture 2. MIPS Instruction Set Architecture EEM 486: Computer Architecture Lecture 2 MIPS Instruction Set Architecture EEM 486 Overview Instruction Representation Big idea: stored program consequences of stored program Instructions as numbers Instruction

More information

Computer Architecture. The Language of the Machine

Computer Architecture. The Language of the Machine Computer Architecture The Language of the Machine Instruction Sets Basic ISA Classes, Addressing, Format Administrative Matters Operations, Branching, Calling conventions Break Organization All computers

More information

Reminder: tutorials start next week!

Reminder: tutorials start next week! Previous lecture recap! Metrics of computer architecture! Fundamental ways of improving performance: parallelism, locality, focus on the common case! Amdahl s Law: speedup proportional only to the affected

More information

What is ILP? Instruction Level Parallelism. Where do we find ILP? How do we expose ILP?

What is ILP? Instruction Level Parallelism. Where do we find ILP? How do we expose ILP? What is ILP? Instruction Level Parallelism or Declaration of Independence The characteristic of a program that certain instructions are, and can potentially be. Any mechanism that creates, identifies,

More information

MIPS Instruction Format

MIPS Instruction Format MIPS Instruction Format MIPS uses a 32-bit fixed-length instruction format. only three different instruction word formats: There are Register format Op-code Rs Rt Rd Function code 000000 sssss ttttt ddddd

More information

MIPS Instruction Set

MIPS Instruction Set MIPS Instruction Set Prof. James L. Frankel Harvard University Version of 7:12 PM 3-Apr-2018 Copyright 2018, 2017, 2016, 201 James L. Frankel. All rights reserved. CPU Overview CPU is an acronym for Central

More information

MIPS Reference Guide

MIPS Reference Guide MIPS Reference Guide Free at PushingButtons.net 2 Table of Contents I. Data Registers 3 II. Instruction Register Formats 4 III. MIPS Instruction Set 5 IV. MIPS Instruction Set (Extended) 6 V. SPIM Programming

More information

ISA: The Hardware Software Interface

ISA: The Hardware Software Interface ISA: The Hardware Software Interface Instruction Set Architecture (ISA) is where software meets hardware In embedded systems, this boundary is often flexible Understanding of ISA design is therefore important

More information

ECE 2035 Programming HW/SW Systems Fall problems, 7 pages Exam Two 23 October 2013

ECE 2035 Programming HW/SW Systems Fall problems, 7 pages Exam Two 23 October 2013 Instructions: This is a closed book, closed note exam. Calculators are not permitted. If you have a question, raise your hand and I will come to you. Please work the exam in pencil and do not separate

More information

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA CISC 662 Graduate Computer Architecture Lecture 4 - ISA Michela Taufer http://www.cis.udel.edu/~taufer/courses Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,

More information

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization CISC 662 Graduate Computer Architecture Lecture 4 - ISA MIPS ISA Michela Taufer http://www.cis.udel.edu/~taufer/courses Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,

More information

MIPS Assembly Language

MIPS Assembly Language MIPS Assembly Language Chapter 15 S. Dandamudi Outline MIPS architecture Registers Addressing modes MIPS instruction set Instruction format Data transfer instructions Arithmetic instructions Logical/shift/rotate/compare

More information

ECE232: Hardware Organization and Design. Computer Organization - Previously covered

ECE232: Hardware Organization and Design. Computer Organization - Previously covered ECE232: Hardware Organization and Design Part 6: MIPS Instructions II http://www.ecs.umass.edu/ece/ece232/ Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Computer Organization

More information

MIPS R-format Instructions. Representing Instructions. Hexadecimal. R-format Example. MIPS I-format Example. MIPS I-format Instructions

MIPS R-format Instructions. Representing Instructions. Hexadecimal. R-format Example. MIPS I-format Example. MIPS I-format Instructions Representing Instructions Instructions are encoded in binary Called machine code MIPS instructions Encoded as 32-bit instruction words Small number of formats encoding operation code (opcode), register

More information

The MIPS Instruction Set Architecture

The MIPS Instruction Set Architecture The MIPS Set Architecture CPS 14 Lecture 5 Today s Lecture Admin HW #1 is due HW #2 assigned Outline Review A specific ISA, we ll use it throughout semester, very similar to the NiosII ISA (we will use

More information

Reduced Instruction Set Computer (RISC)

Reduced Instruction Set Computer (RISC) Reduced Instruction Set Computer (RISC) Reduced Instruction Set Computer (RISC) Focuses on reducing the number and complexity of instructions of the machine. Reduced number of cycles needed per instruction.

More information

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts Prof. Sherief Reda School of Engineering Brown University S. Reda EN2910A FALL'15 1 Classical concepts (prerequisite) 1. Instruction

More information

ECE 2035 Programming HW/SW Systems Fall problems, 6 pages Exam One 19 September 2012

ECE 2035 Programming HW/SW Systems Fall problems, 6 pages Exam One 19 September 2012 Instructions: This is a closed book, closed note exam. Calculators are not permitted. If you have a question, raise your hand and I will come to you. Please work the exam in pencil and do not separate

More information

Computer Architecture

Computer Architecture CS3350B Computer Architecture Winter 2015 Lecture 4.2: MIPS ISA -- Instruction Representation Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design,

More information

CS61c MIDTERM EXAM: 3/17/99

CS61c MIDTERM EXAM: 3/17/99 CS61c MIDTERM EXAM: 3/17/99 D. A. Patterson Last name Student ID number First name Login: cs61c- Please circle the last two letters of your login name. a b c d e f g h i j k l m n o p q r s t u v w x y

More information

ECE260: Fundamentals of Computer Engineering

ECE260: Fundamentals of Computer Engineering MIPS Instruction Set James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy MIPS Registers MIPS

More information

Static vs. Dynamic Scheduling

Static vs. Dynamic Scheduling Static vs. Dynamic Scheduling Dynamic Scheduling Fast Requires complex hardware More power consumption May result in a slower clock Static Scheduling Done in S/W (compiler) Maybe not as fast Simpler processor

More information

The MIPS R2000 Instruction Set

The MIPS R2000 Instruction Set The MIPS R2000 Instruction Set Arithmetic and Logical Instructions In all instructions below, Src2 can either be a register or an immediate value (a 16 bit integer). The immediate forms of the instructions

More information

Reduced Instruction Set Computer (RISC)

Reduced Instruction Set Computer (RISC) Reduced Instruction Set Computer (RISC) Focuses on reducing the number and complexity of instructions of the ISA. RISC Goals RISC: Simplify ISA Simplify CPU Design Better CPU Performance Motivated by simplifying

More information

Programmable Machines

Programmable Machines Programmable Machines Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Quiz 1: next week Covers L1-L8 Oct 11, 7:30-9:30PM Walker memorial 50-340 L09-1 6.004 So Far Using Combinational

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Instructions: Language of the Computer Operations and Operands of the Computer Hardware Signed and Unsigned Numbers Representing

More information

Lecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S

Lecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S Lecture 6 MIPS R4000 and Instruction Level Parallelism Computer Architectures 521480S Case Study: MIPS R4000 (200 MHz, 64-bit instructions, MIPS-3 instruction set) 8 Stage Pipeline: first half of fetching

More information

Review: MIPS Organization

Review: MIPS Organization 1 MIPS Arithmetic Review: MIPS Organization Processor Memory src1 addr 5 src2 addr 5 dst addr 5 write data Register File registers ($zero - $ra) bits src1 data src2 data read/write addr 1 1100 2 30 words

More information

Review. Lecture #9 MIPS Logical & Shift Ops, and Instruction Representation I Logical Operators (1/3) Bitwise Operations

Review. Lecture #9 MIPS Logical & Shift Ops, and Instruction Representation I Logical Operators (1/3) Bitwise Operations CS6C L9 MIPS Logical & Shift Ops, and Instruction Representation I () inst.eecs.berkeley.edu/~cs6c CS6C : Machine Structures Lecture #9 MIPS Logical & Shift Ops, and Instruction Representation I 25-9-28

More information

Character Is a byte quantity (00~FF or 0~255) ASCII (American Standard Code for Information Interchange) Page 91, Fig. 2.21

Character Is a byte quantity (00~FF or 0~255) ASCII (American Standard Code for Information Interchange) Page 91, Fig. 2.21 2.9 Communication with People: Byte Data & Constants Character Is a byte quantity (00~FF or 0~255) ASCII (American Standard Code for Information Interchange) Page 91, Fig. 2.21 32: space 33:! 34: 35: #...

More information

Assembly Programming

Assembly Programming Designing Computer Systems Assembly Programming 08:34:48 PM 23 August 2016 AP-1 Scott & Linda Wills Designing Computer Systems Assembly Programming In the early days of computers, assembly programming

More information

Programmable Machines

Programmable Machines Programmable Machines Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Quiz 1: next week Covers L1-L8 Oct 11, 7:30-9:30PM Walker memorial 50-340 L09-1 6.004 So Far Using Combinational

More information

CS3350B Computer Architecture MIPS Instruction Representation

CS3350B Computer Architecture MIPS Instruction Representation CS3350B Computer Architecture MIPS Instruction Representation Marc Moreno Maza http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html Department of Computer Science University of Western Ontario, Canada

More information

Computer Science and Engineering 331. Midterm Examination #1. Fall Name: Solutions S.S.#:

Computer Science and Engineering 331. Midterm Examination #1. Fall Name: Solutions S.S.#: Computer Science and Engineering 331 Midterm Examination #1 Fall 2000 Name: Solutions S.S.#: 1 41 2 13 3 18 4 28 Total 100 Instructions: This exam contains 4 questions. It is closed book and notes. Calculators

More information

Thomas Polzer Institut für Technische Informatik

Thomas Polzer Institut für Technische Informatik Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Branch to a labeled instruction if a condition is true Otherwise, continue sequentially beq rs, rt, L1 if (rs == rt) branch to

More information

Lecture 2. Instructions: Language of the Computer (Chapter 2 of the textbook)

Lecture 2. Instructions: Language of the Computer (Chapter 2 of the textbook) Lecture 2 Instructions: Language of the Computer (Chapter 2 of the textbook) Instructions: tell computers what to do Chapter 2 Instructions: Language of the Computer 2 Introduction Chapter 2.1 Chapter

More information

Computer Architecture (TT 2011)

Computer Architecture (TT 2011) Computer Architecture (TT 2011) The MIPS/DLX/RISC Architecture Daniel Kroening Oxford University, Computer Science Department Version 1.0, 2011 Outline ISAs Overview MIPS/DLX Instruction Formats D. Kroening:

More information

ECE260: Fundamentals of Computer Engineering

ECE260: Fundamentals of Computer Engineering MIPS Instruction Set James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy MIPS Registers MIPS

More information

ICS 233 COMPUTER ARCHITECTURE. MIPS Processor Design Multicycle Implementation

ICS 233 COMPUTER ARCHITECTURE. MIPS Processor Design Multicycle Implementation ICS 233 COMPUTER ARCHITECTURE MIPS Processor Design Multicycle Implementation Lecture 23 1 Add immediate unsigned Subtract unsigned And And immediate Or Or immediate Nor Shift left logical Shift right

More information

Introduction to the MIPS. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Introduction to the MIPS. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Introduction to the MIPS Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Introduction to the MIPS The Microprocessor without Interlocked Pipeline Stages

More information

Overview. Introduction to the MIPS ISA. MIPS ISA Overview. Overview (2)

Overview. Introduction to the MIPS ISA. MIPS ISA Overview. Overview (2) Introduction to the MIPS ISA Overview Remember that the machine only understands very basic instructions (machine instructions) It is the compiler s job to translate your high-level (e.g. C program) into

More information

Chapter 2. Instructions: Language of the Computer. Adapted by Paulo Lopes

Chapter 2. Instructions: Language of the Computer. Adapted by Paulo Lopes Chapter 2 Instructions: Language of the Computer Adapted by Paulo Lopes Instruction Set The repertoire of instructions of a computer Different computers have different instruction sets But with many aspects

More information

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

Review: Evaluating Branch Alternatives. Lecture 3: Introduction to Advanced Pipelining. Review: Evaluating Branch Prediction

Review: Evaluating Branch Alternatives. Lecture 3: Introduction to Advanced Pipelining. Review: Evaluating Branch Prediction Review: Evaluating Branch Alternatives Lecture 3: Introduction to Advanced Pipelining Two part solution: Determine branch taken or not sooner, AND Compute taken branch address earlier Pipeline speedup

More information

Concocting an Instruction Set

Concocting an Instruction Set Concocting an Instruction Set Nerd Chef at work. move flour,bowl add milk,bowl add egg,bowl move bowl,mixer rotate mixer... Read: Chapter 2.1-2.7 L03 Instruction Set 1 A General-Purpose Computer The von

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Instructions: Language of the Computer Operations and Operands of the Computer Hardware Signed and Unsigned Numbers Representing

More information

Branch Addressing. Jump Addressing. Target Addressing Example. The University of Adelaide, School of Computer Science 28 September 2015

Branch Addressing. Jump Addressing. Target Addressing Example. The University of Adelaide, School of Computer Science 28 September 2015 Branch Addressing Branch instructions specify Opcode, two registers, target address Most branch targets are near branch Forward or backward op rs rt constant or address 6 bits 5 bits 5 bits 16 bits PC-relative

More information

Floating Point/Multicycle Pipelining in DLX

Floating Point/Multicycle Pipelining in DLX Floating Point/Multicycle Pipelining in DLX Completion of DLX EX stage floating point arithmetic operations in one or two cycles is impractical since it requires: A much longer CPU clock cycle, and/or

More information

CPE 631 Lecture 09: Instruction Level Parallelism and Its Dynamic Exploitation

CPE 631 Lecture 09: Instruction Level Parallelism and Its Dynamic Exploitation Lecture 09: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Instruction

More information

ISA and RISCV. CASS 2018 Lavanya Ramapantulu

ISA and RISCV. CASS 2018 Lavanya Ramapantulu ISA and RISCV CASS 2018 Lavanya Ramapantulu Program Program =?? Algorithm + Data Structures Niklaus Wirth Program (Abstraction) of processor/hardware that executes 3-Jul-18 CASS18 - ISA and RISCV 2 Program

More information

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

MIPS ISA and MIPS Assembly. CS301 Prof. Szajda

MIPS ISA and MIPS Assembly. CS301 Prof. Szajda MIPS ISA and MIPS Assembly CS301 Prof. Szajda Administrative HW #2 due Wednesday (9/11) at 5pm Lab #2 due Friday (9/13) 1:30pm Read Appendix B5, B6, B.9 and Chapter 2.5-2.9 (if you have not already done

More information

Instructions: MIPS ISA. Chapter 2 Instructions: Language of the Computer 1

Instructions: MIPS ISA. Chapter 2 Instructions: Language of the Computer 1 Instructions: MIPS ISA Chapter 2 Instructions: Language of the Computer 1 PH Chapter 2 Pt A Instructions: MIPS ISA Based on Text: Patterson Henessey Publisher: Morgan Kaufmann Edited by Y.K. Malaiya for

More information

RISC-V Assembly and Binary Notation

RISC-V Assembly and Binary Notation RISC-V Assembly and Binary Notation L02-1 Course Mechanics Reminders Course website: http://6004.mit.edu All lectures, videos, tutorials, and exam material can be found under Information/Resources tab.

More information

Lecture 4: Introduction to Advanced Pipelining

Lecture 4: Introduction to Advanced Pipelining Lecture 4: Introduction to Advanced Pipelining Prepared by: Professor David A. Patterson Computer Science 252, Fall 1996 Edited and presented by : Prof. Kurt Keutzer Computer Science 252, Spring 2000 KK

More information

Question 0. Do not turn this page until you have received the signal to start. (Please fill out the identification section above) Good Luck!

Question 0. Do not turn this page until you have received the signal to start. (Please fill out the identification section above) Good Luck! CSC B58 Winter 2017 Final Examination Duration 2 hours and 50 minutes Aids allowed: none Last Name: Student Number: UTORid: First Name: Question 0. [1 mark] Read and follow all instructions on this page,

More information

Computer Architecture, EDT030

Computer Architecture, EDT030 Department of Information Technology Computer Architecture, EDT030 Exam, Tuesday, March 7, 2000, 8.00-12.00 am The exam consists of a number of assignments with a total of 100 points. Grading: 40p grade

More information

Instruction Set Principles. (Appendix B)

Instruction Set Principles. (Appendix B) Instruction Set Principles (Appendix B) Outline Introduction Classification of Instruction Set Architectures Addressing Modes Instruction Set Operations Type & Size of Operands Instruction Set Encoding

More information

ECE 2035 Programming HW/SW Systems Spring problems, 6 pages Exam One 4 February Your Name (please print clearly)

ECE 2035 Programming HW/SW Systems Spring problems, 6 pages Exam One 4 February Your Name (please print clearly) Your Name (please print clearly) This exam will be conducted according to the Georgia Tech Honor Code. I pledge to neither give nor receive unauthorized assistance on this exam and to abide by all provisions

More information

ELE 818 * ADVANCED COMPUTER ARCHITECTURES * MIDTERM TEST *

ELE 818 * ADVANCED COMPUTER ARCHITECTURES * MIDTERM TEST * ELE 818 * ADVANCED COMPUTER ARCHITECTURES * MIDTERM TEST * SAMPLE 1 Section: Simple pipeline for integer operations For all following questions we assume that: a) Pipeline contains 5 stages: IF, ID, EX,

More information

DLX computer. Electronic Computers M

DLX computer. Electronic Computers M DLX computer Electronic Computers 1 RISC architectures RISC vs CISC (Reduced Instruction Set Computer vs Complex Instruction Set Computer In CISC architectures the 10% of the instructions are used in 90%

More information

Computer Architecture. Chapter 3: Arithmetic for Computers

Computer Architecture. Chapter 3: Arithmetic for Computers 182.092 Computer Architecture Chapter 3: Arithmetic for Computers Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2008, Morgan Kaufmann Publishers and Mary Jane Irwin

More information

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,

More information

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer Pipeline CPI http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson

More information

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &

More information

09-1 Multicycle Pipeline Operations

09-1 Multicycle Pipeline Operations 09-1 Multicycle Pipeline Operations 09-1 Material may be added to this set. Material Covered Section 3.7. Long-Latency Operations (Topics) Typical long-latency instructions: floating point Pipelined v.

More information

ECE 2035 Programming HW/SW Systems Spring problems, 6 pages Exam Two 11 March Your Name (please print) total

ECE 2035 Programming HW/SW Systems Spring problems, 6 pages Exam Two 11 March Your Name (please print) total Instructions: This is a closed book, closed note exam. Calculators are not permitted. If you have a question, raise your hand and I will come to you. Please work the exam in pencil and do not separate

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 th The Hardware/Software Interface Edition Chapter 2 Instructions: Language of the Computer 2.1 Introduction Instruction Set The repertoire of instructions of a computer

More information

CS252 Graduate Computer Architecture Lecture 6. Recall: Software Pipelining Example

CS252 Graduate Computer Architecture Lecture 6. Recall: Software Pipelining Example CS252 Graduate Computer Architecture Lecture 6 Tomasulo, Implicit Register Renaming, Loop-Level Parallelism Extraction Explicit Register Renaming John Kubiatowicz Electrical Engineering and Computer Sciences

More information

Q1: /30 Q2: /25 Q3: /45. Total: /100

Q1: /30 Q2: /25 Q3: /45. Total: /100 ECE 2035(A) Programming for Hardware/Software Systems Fall 2013 Exam One September 19 th 2013 This is a closed book, closed note texam. Calculators are not permitted. Please work the exam in pencil and

More information

Four Steps of Speculative Tomasulo cycle 0

Four Steps of Speculative Tomasulo cycle 0 HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly

More information

Adventures in Assembly Land

Adventures in Assembly Land Adventures in Assembly Land What is an Assembler ASM Directives ASM Syntax Intro to SPIM Simple examples L6 Simulator 1 A Simple Programming Task Add the numbers 0 to 4 10 = 0 + 1 + 2 + 3 + 4 In C : int

More information

This Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods

This Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods 10-1 Dynamic Scheduling 10-1 This Set Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods Not yet complete. (Material below may

More information

MIPS Assembly Language. Today s Lecture

MIPS Assembly Language. Today s Lecture MIPS Assembly Language Computer Science 104 Lecture 6 Homework #2 Midterm I Feb 22 (in class closed book) Outline Assembly Programming Reading Chapter 2, Appendix B Today s Lecture 2 Review: A Program

More information

Instructions: Language of the Computer

Instructions: Language of the Computer Instructions: Language of the Computer Tuesday 22 September 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary Previous Class

More information

Today s Lecture. MIPS Assembly Language. Review: What Must be Specified? Review: A Program. Review: MIPS Instruction Formats

Today s Lecture. MIPS Assembly Language. Review: What Must be Specified? Review: A Program. Review: MIPS Instruction Formats Today s Lecture Homework #2 Midterm I Feb 22 (in class closed book) MIPS Assembly Language Computer Science 14 Lecture 6 Outline Assembly Programming Reading Chapter 2, Appendix B 2 Review: A Program Review:

More information

ECE Exam I February 19 th, :00 pm 4:25pm

ECE Exam I February 19 th, :00 pm 4:25pm ECE 3056 Exam I February 19 th, 2015 3:00 pm 4:25pm 1. The exam is closed, notes, closed text, and no calculators. 2. The Georgia Tech Honor Code governs this examination. 3. There are 4 questions and

More information

Processor: Superscalars Dynamic Scheduling

Processor: Superscalars Dynamic Scheduling Processor: Superscalars Dynamic Scheduling Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 (Princeton),

More information

MIPS ISA. 1. Data and Address Size 8-, 16-, 32-, 64-bit 2. Which instructions does the processor support

MIPS ISA. 1. Data and Address Size 8-, 16-, 32-, 64-bit 2. Which instructions does the processor support Components of an ISA EE 357 Unit 11 MIPS ISA 1. Data and Address Size 8-, 16-, 32-, 64-bit 2. Which instructions does the processor support SUBtract instruc. vs. NEGate + ADD instrucs. 3. Registers accessible

More information

Digital Design Using Verilog and FPGAs An Experiment Manual. Chirag Sangani Abhishek Kasina

Digital Design Using Verilog and FPGAs An Experiment Manual. Chirag Sangani Abhishek Kasina Digital Design Using Verilog and FPGAs An Experiment Manual Chirag Sangani Abhishek Kasina ii Contents I Combinatorial and Sequential Circuits 1 1 Seven-Segment Decoder 3 1.1 Concept.........................................

More information

M2 Instruction Set Architecture

M2 Instruction Set Architecture M2 Instruction Set Architecture Module Outline Addressing modes. Instruction classes. MIPS-I ISA. High level languages, Assembly languages and object code. Translating and starting a program. Subroutine

More information

Kernel Registers 0 1. Global Data Pointer. Stack Pointer. Frame Pointer. Return Address.

Kernel Registers 0 1. Global Data Pointer. Stack Pointer. Frame Pointer. Return Address. The MIPS Register Set The MIPS R2000 CPU has 32 registers. 31 of these are general-purpose registers that can be used in any of the instructions. The last one, denoted register zero, is defined to contain

More information

Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

Rui Wang, Assistant professor Dept. of Information and Communication Tongji University. Instructions: ti Language of the Computer Rui Wang, Assistant professor Dept. of Information and Communication Tongji University it Email: ruiwang@tongji.edu.cn Computer Hierarchy Levels Language understood

More information

Complications with long instructions. CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. How slow is slow?

Complications with long instructions. CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. How slow is slow? Complications with long instructions CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3 Long Instructions & MIPS Case Study So far, all MIPS instructions take 5 cycles But haven't talked

More information

Instruction Level Parallelism

Instruction Level Parallelism Instruction Level Parallelism The potential overlap among instruction execution is called Instruction Level Parallelism (ILP) since instructions can be executed in parallel. There are mainly two approaches

More information

Chapter 2A Instructions: Language of the Computer

Chapter 2A Instructions: Language of the Computer Chapter 2A Instructions: Language of the Computer Copyright 2009 Elsevier, Inc. All rights reserved. Instruction Set The repertoire of instructions of a computer Different computers have different instruction

More information

ECE 2035 Programming HW/SW Systems Fall problems, 6 pages Exam Two 23 October Your Name (please print clearly) Signed.

ECE 2035 Programming HW/SW Systems Fall problems, 6 pages Exam Two 23 October Your Name (please print clearly) Signed. Your Name (please print clearly) This exam will be conducted according to the Georgia Tech Honor Code. I pledge to neither give nor receive unauthorized assistance on this exam and to abide by all provisions

More information

Getting CPI under 1: Outline

Getting CPI under 1: Outline CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more

More information

CPS311 - COMPUTER ORGANIZATION. A bit of history

CPS311 - COMPUTER ORGANIZATION. A bit of history CPS311 - COMPUTER ORGANIZATION A Brief Introduction to the MIPS Architecture A bit of history The MIPS architecture grows out of an early 1980's research project at Stanford University. In 1984, MIPS computer

More information

/ : Computer Architecture and Design Fall 2014 Midterm Exam Solution

/ : Computer Architecture and Design Fall 2014 Midterm Exam Solution 16.482 / 16.561: Computer Architecture and Design Fall 2014 Midterm Exam Solution 1. (8 points) UEvaluating instructions Assume the following initial state prior to executing the instructions below. Note

More information