TAMPERE UNIVERSITY OF TECHNOLOGY Institute of Digital and Computer Systems. Exercise 2: DLX I - Architecture

Size: px

Start display at page:

Download "TAMPERE UNIVERSITY OF TECHNOLOGY Institute of Digital and Computer Systems. Exercise 2: DLX I - Architecture"

Carmella Bailey
5 years ago
Views:

1 TAMPERE UNIVERSITY OF TECHNOLOGY Institute of Digital and Computer Systems TKT-3206 Computer Architecture I Exercise 2: DLX I - Architecture Group Name stud. num.

2 General info about the exercise The purpose of this exercise is to explain how a pipelined processor works, and which factors affect its performance. A simulator of a pipelined DLX processor is used in this exercise work. The simulator has been done in Lund University, Sweden. The returning of this exercise is this document completed. Give clear and brief, yet sufficiently accurate answers to all the questions. Write your answers with readable handwriting, using a pencil so that possible errors can be corrected. Unclear, unintelligible, and nondescriptive answers yield a boomerang or possibly re-doing the whole exercise from the beginning. DLX-simulator In this exercise, a data path model of a DLX processor is studied. A comprehensive model that effectively solves hazards is developed in three steps. The first model, Datapath Model 1 (DP-1), lacks the support for bypassing technique. The second model, Datapath model 2 (DP-2), supports the bypassing technique, but uses the ALU for calculating the jump addresses. The third model, DLX, features a separate adder for the jump address calculation. The simulator supports only integer instructions. The DLX CONTROL exercise will study the implementation of control logic for the DLX processor model. The simulator supports symbolic machine language written for the DLX processor. An assembler code for the simulator can be written using a regular text editor. The extension of the code file must be '.s', and the last instruction of the prorgam trap 0. Installation Go to your home directory in Birdland. Copy the simulator and the program code files to yourself (first make sure that you have at least 1.1 MB of quota left) using the command: cp -r ~rhu/public_html/tktekn/dlx/. Now you have all the files needed in doing this and DLX-CONTROL exercise in the subdirectory DLX. Go to this sub directory (cd DLX). Starting the simulator The simulator is started with the command: sparc_pipe (or./sparc_pipe is the current path is not defined in the search path of your environment settings) 1/18

3 and the following window opens: The window shows the five stages of the processor, which are (from lef): IF (Instruction Fetch), ID (Instruction Decode), EX (Execute), MEM (Memory Access) and WB (Write Back to register file). The processor is pipelined by inserting D-flip-flops (registers) in between the stages. The intermediate results of each stage are saved into these registers, which are symbolized in the simulator by the boxes on top of the dash lines, on each clock cycle. The functional units that perform the calculation reside between the dash lines. The calculation proceeds from left to right, with the exception of the writebacks. The bold lines represent 32-bit buses which connect the functional units and registers. The thin lines imply one-bit control signals. A diagram that uses the above-described grouping of clocksynchronized memory elements on vertical lines is called a Werner-diagram. The instruction executed on each stage is shown in the box below the stage. The contents of the register file can be examined by clicking the left mouse button ontop of the Register File box in the simulator. The contents of the register file cannot be altered manually. The functions of the simulator On the top-lefthand corner of the window are two pull-down menus: File and Views. An assembler code can be loaded by using the command Load of the File menu, and the simulator exited using the Quit command. The used data path model (DP_1, DP_2 or DLX) is chosen from the Views menu. Each data path model contains three options that are used in examining hazards: Pipelining/No Pipelining, Delayed Load/No Delayed Load and Delayed Branch/No Delayed Brach. These options can be changed by clicking the boxes on the top of the simulator window. Before executing a program, the Program Counter (PC) and the Register File must be resetted. This is done by clicking the Reset box. Note that Reset does not reset the data memory; it can be resetted e.g. by (re)loading the program code. The program can be executed one clock cycle at a time by clicking the box Clock. Clicking Run will execute the wohle program. Elapsed Cycles shows the amount of clock cycles passed from the moment when the first instruction enters the IF stage, or if Run is clicked, the total number of clock cycles from the start until the last instruction reaches WB stage. 2/18

4 Clock cycles Per Instruction (CPI) is calculated with the following formula: CPI NumberOfClockCycles = = NumberOfInstructions( nonop) NumberOfClockCycles NumberOfClockCycles Hazards FillingThePipeline Hazards is the number of clock cycles that is needed in solving the hazards. This sum composes of the NOP-instructions reaching the WB stage, which appear after the first instruction reaches WB stage, and of the clock cycles when the pipeline is stalled due to hazards. Sub-operations of the DLX instructions Examine the following program: LW ADD SW R3,18(R0) R1,R3,R3 18(R0),R1 What does M[18+R0] contain after the program execution, if M[18+R0] is before the execution? Next, we will study what sub-operations an un-pipelined data path performs on each of its stages to the instructions of the above code. By running the program without pipelining, we can concentrate on the execution of a single instruction at a time and examine what is happening at the different pipeline stages. The program code exempel1.s is loaded into the simulator by selecting File->Load. Type the name of the code file and click OK. Make sure that the data path settings are: Datapath Model 1, No Pipelining, No Delayed Load and No Delayed Branch. Give a clock pulse by clicking the box Clock. What instruction is read into the processor? What does that particular instruction do? Give a clock pulse. Enter data into table 1 (the abbreviations of the sub-operations are listed in table 2). What is being done in the ID stage? Give a clock pulse. Where does the ALU get its operands? What is the result of the ALU-operation? What was calculated in the ALU? Give a clock pulse. What value does DMAR get? What is the DMAR-bus connected into? What sub-operetion was performed in the EX stage? 3/18

5 What does the register R3 contain (click the Register-File)? Give a clock pulse. What does the register R3 now contain and why? The instruction LW R3,18(R0) has now been executed. Mark the performed sub-operations into table 1 using the abbreviations listed in table 2. On the last row, mark which functional unit in the Werner diagram performs the operation in each stage. Table 1: The sub-operations of the example code. Class IF ID EX MEM WB LOAD instruction ALU instruction STORE instruction BRANCH instruction Functional unit Table2: Sub-operation abbreviations for table 1. Sub-operation Abbreviation Instruction Fetch Register Read Arithmetic/logical Operation Operand address Calculation Jump address Calculation Memory Read Memory Write Register Write Load Program Counter no sub-operation - Give a clock pulse. The instruction ADD R1,R3,R3 is fetched now. Answer the following questions by giving clock pulses. What is done in the EX stage? Which sub-operation was performed in the MEM stage? IF RR AO OC JC MR MW RW LPC 4/18

6 5/18 Examine the contents of registers R1 and R3 before you forward the instruction into the WB stage. What do R1 and R3 contain? R1 = R3 = Give a clock pulse. What does the register R1 now contain and why? R1 = Fill into table 1 the sub-operations that were now performed. The next instruction is SW 18(R0),R1. Execute the instruction by giving enough clock pulses and answer to the following questions. What does the instruction do? Which registers were read in the ID stage? What was done in the EX stage? Which sub-operation was performed in the MEM stage? A special bus was used in transferring the contents of register R1. Why? Which sub-operation is performed in the WB stage? Fill in the the sub-operations of the instruction into table 1. When resolving the stage in which a given sub-operation is performed, it does not matter where the corresponding functional unit resides. The stage in which a sub-operation is performed is defined based on the location of the instruction at the moment of the sub-operation execution. For example, the register write is performed in the WB stage even though the register file is in the ID stage. This is an important property when the sub-operations of a BRANCH instruction are examined. Finally, we study the sub-operations of the instruction BEQZ R0,14. What is done in the ID stage? What is done in the EX stage? What is the output of the ALU?

7 What controls the multiplexer MX1? In which situations the control signal of MX1 changes its state? The value of the PC changes as you move the instruction into the WB stage. What is the new PC value? Why does the PC value change into this particular number? Now fill in the missing fields of table 1. We can see from table 1 that for some of the instructions, a sub-operation is performed in each stage, while some of the instructions require a no sub-operation in some of their stages. For which instruction classes and in which stages these no sub-operations are performed? So far, each instruction has been executed separately. An important figure with regard of the effectiveness of a code is how many clock cycles on average are needed in executing one instruction. This figure is called Clock cycles Per Instruction (CPI). What is the CPI of the executed program and why? Parallelism and pipelining In order for the pipelining to be possible, every instruction must be able to be divided into suboperations, and these sub-operations to be performed in order. However, the instructions do not have to utilize every sub-operation. Furthermore, the processor must have enough functional units and buses so that each pipeline stage can execute a sub-operation on every clock cycle. If this condition is not met, the pipeline must be stalled until busy functional unit or bus becomes available again. This situation is referred to as a structural hazard. Mark into table 1 which pipeline stages use the following functional units: data memory, instruction memory, ALU, and register file. Why two pipeline stages can simultaneously use the register file without causing a structural hazard? What are these stages? What architectural properties must the memory have in order to avoid a structural hazard? 6/18

8 7/18 From now on, we assume that the memory has been constructed so that there are no structural hazards. Examine the following program: ADDI R1,R0,#1;R1 <- 1 ADDI R2,R0,#2;R2 <- 2 ADDI R3,R0,#4;R3 <- 4 ADD R1,R1,R1;R1 <- R1+R1 ADD R2,R2,R2;R2 <- R2+R2 ADD R3,R3,R3;R3 <- R3+R3 ADD R1,R1,R1;R1 <- R1+R1 ADD R2,R2,R2;R2 <- R2+R2 ADD R3,R3,R3;R3 <- R3+R3 Determine by examining the code, what do the registers R1, R2, R3 contain after the execution? Give the answer in hexadecimals. R1 = R2 = R3 = The program is in file exempel2.s. Load it into the simulator and run it (Run). How many clock cycles does the execution take, when only one instruction is performed at a time? (DP_1, No pipelining) Now, enable the pipelining (Pipelining), and the processor resets. Give clock pulses until the firstthe instruction reaches MEM stage. How many clock cycles does it take? After this, on every clock cycle, one instruction is finished. We can say that now the pipeline is full. Give a clock pulse. Which register is updated and which registers are read? How many instructions are performed simultaneously in the pipeline? Give clock pulses, until the last instruction is in the WB stage. What are the contents of registers R1,R2 and R3? R1 = R2 = R3 = How many clock cycles did the execution take? If the clock cycles needed in filling the pipeline are not counted, how much faster is the execution when pipelining is used? Mark in the following space also the formula you used to calculate your result with. From this we can conclude that pipelining significantly increases the speed of the calculation if there are no structural hazards. In practice, however, there will be some problems that reduce this speed-up. Next, we will study what exactly these problems are.

9 Data hazards Next, we will determine what kind of special mechanisms are required to quarantee correct program execution in spite of data dependencies between the instructions. Bypassing Examine the following program: ADDI R1,R0,#2;R1 <- 2 ADD R2,R1,R1;R2 <- R1+R1 ADD R3,R1,R1;R3 <- R1+R1 ADD R4,R1,R1;R4 <- R1+R1 What should the registers R1, R2, R3 and R4 contain after the program execution? R1 = R2 = R3 = R4 = The program is in file exempel3.s. Load and run the program. (DP_1, Pipelining) What do the registers R1, R2, R3 and R4 contain after the execution? R1 = R2 = R3 = R4 = Apparently, something went wrong. To understand what it was, run the program again. Reset the processor and give clock pulses until instruction ADD R2,R1,R1 reaches ID stage. What is the value of register R1? R1 = Where in the data path is the correct value of register R1? Give clock pulses until instruction ADD R3,R1,R1 reaches ID stage. What is the value of register R1 when it is being read? R1 = Where in the data path is the correct value of register R1? Give clock pulses until instruction ADD R4,R1,R1 reaches ID stage. What is the value of register R1? Is it correct? R1 = Now you have (hopefully) understood that the register file does not always contain the correct values. Thus, it must be possible to bypass the result of an instruction to the next two instructions if they use this result as their operand. 8/18

10 From which point(s) of the pipeline must this bypassing be possible? The next data path model, Datapath Model 2 (DP_2) supports the Bypassing technique. Choose data path model DP_2 from the View menu (and the processor is resetted). Give clock pulses until instruction ADD R2,R1,R1 reaches ID stage. Examine the state of multiplexers MX2 and MX3. From where comes the value of R1? Give clock pulses until instruction ADD R3,R1,R1 reaches ID stage. Examine the state of multiplexers MX2 and MX3 again. From where comes the value of R1 now? Give clock pulses until the program has been completely executed and make sure that the registers have the correct values. Delayed Load Examine the following program: LW ADD ADD ADD R1,18(R0);R1 <- M[18] R2,R1,R1;R2 <- R1+R1 R3,R1,R1;R3 <- R1+R1 R4,R1,R1;R4 <- R1+R1 What should the registers R1, R2, R3 and R4 contain after the program execution if M[18] = before the execution? R1 = R2 = R3 = R4 = The program is in file exempel4.s. Load and run the program (DP_2, Pipelining). What do the registers R1, R2, R3 and R4 contain after the execution? R1 = R2 = R3 = R4 = Apparently, something went wrong. To understand what it was, run the program again. Reset the processor and give clock pulses until instruction ADD R2,R1,R1 reaches ID stage. In which stage is instruction LW R1,18(R0)? Give a clock pulse. Where in the data path is the correct value of register R1? In which stage is instruction ADD R2,R1,R1? 9/18

11 Why instruction ADD R3,R1,R1 gets the correct register value? The execution of one instruction failed. Which instruction was it, and why exactly this particular instruction failed? This problem can be solved by using Delayed Load. Always when the result of a LOAD instruction is used by the next instruction, the pipeline is stalled until the Bypassing technique can be used to forward the loaded value to the next instruction(s). Activate Delayed Loadby clicking the filed No Delayed Load. (DP_2, Pipelining, Delayed Load, Delayed Branch) Give clock pulses until instruction ADD R2,R1,R1 reaches ID stage. In which stage is instruction LW R1,18(R0)? Give a clock pulse. In which stage is instruction ADD R2,R1,R1 now? Pipeline gets stalled (STALL) so that the instructions in ID and IF stages no longer progress within the pipeline. From where is the value of register R1 now obtained? Finish the execution with the Run command and make sure that the values of the registers are now correct. How many clock cycles did the execution take? How many clock cycles would the execution take if there was no data dependency between the first LOAD instruction and the first ADD instruction? Delayed Load and Bypassing together quarantee that all data hazards are eliminated. However, one clock cycle is lost every time when there is a data dependency between a LOAD instruction and the instruction following it. 10/18

12 Control hazards Examine the following program (the memory addresses of the instructions are on the left) 0000 ADDI R1,R0,#9;R1 < ADD R2,R0,R0;R2 < ADD R3,R0,R0;R3 < C ADD R4,R0,R0;R4 < ADD R5,R0,R0;R5 < LOOP: ADDI R2,R2,#1;R2 <- R SUBI R1,R1,#1;R1 <- R C BNEZ R1,LOOP;If R1 <> 0 then BRANCH to LOOP 0020 ADD R3,R3,R2;R3 <- R3+R ADD R4,R4,R2;R4 <- R4+R ADD R5,R5,R2;R5 <- R5+R2 What should the registers R1, R2, R3, R4 and R5 contain after the program execution? R1 = R2 = R3 = R4 = R5 = What should the registers R3, R4 and R5 contain just before instruction ADD R3,R3,R2 is executed? R3 = R4 = R5 = How many instructions are executed? The program is in file exempel5.s. Load and run the program. (DP_2, Pipelining, Delayed Load, Delayed Branch) What do the registers R1, R2, R3, R4 and R5 contain after the execution? R1 = R2 = R3 = R4 = R5 = Reset the processor and give clock pulses until instruction BNEZ R1,LOOP reaches ID stage the first time. Give a clock pulse. Which instruction is fetched? Give a clock pulse. Which instruction is now fetched? What is the value of the program counter? PC = Give a clock pulse. What is the value of the program counter and which instruction is now fetched? PC = 11/18

13 Give a clock pulse. What is the value of register R3 and why it is wrong? R3 = After the branch instruction has reached ID stage, how many clock cycles does it take until the address determined by the branch instruction is loaded into the program counter? The processor does not function correctly. The problem is that the branch address is not loaded into the program counter immediately, but some instruction following the branch instruction are executed regardless whether the branch condition evaluates true or false. The simple solution is to stop the pipeline whenever a branch instruction is about to proceed forward from ID stage. The execution continues only after the jump address has been calculated and the jump condition evaluated, or, if the branch is taken, after the program counter has been updated to point to the new instruction. This is achieved by using the option No Delayed Branch, which we will study next. Change the processor settings to: DP_2, Pipelining, Delayed Load, No Delayed Branch. Give clock pulses, until instruction BNEZ R1,LOOP reaches ID stage the first time. Give a clock pulse. Which instruction is fetched? The fetching of instructions has apparently been halted. After how many clock cycles the fetching of new instructions is resumed? Finish the execution of the program and make sure that the values of the registers are correct. How many clock cycles does the execution take? Why the number of used clock cycles differs greatly from the number of executed instructions? Now the processor works correctly with branch instructions, but the performance is greatly degraded due to the large branch overhead. In practice, processors spend a large portion of the total execution time within program loops, and since the loops are predominantly short, a big loop-overhead is simply not acceptable. Advancing the jump address calculation The problem with datapath model 2 is that as the jump address and branch condition are calculated in EX stage and the results are available during MEM stage. If the calculation could be advanced, the loop overhead would be smaller. 12/18

14 13/18 The earliest stage in which the processor recognizes an instruction to be a branch instruction is ID stage. Thus, the jump address calculation and jump condition evaluation can be done already in ID stage, as seen in the datapath model DLX. Choose datapath model DLX (DLX, Pipelining, Delayed Load, No Delayed Branch). How does the DLX model differ from the DP_2 model? Load the same program as earlier (exempel5.s). Give clock pulses until instruction BNEZ R1,LOOP reaches ID stage the first time. Give a clock pulse. Which instruction is now fetched and why? Finish the program execution. How many clock cycles did the execution take? How much faster was the execution compared to DP-2? What is the CPI value? Now the CPI value is close to one, but not yet quite one. What is the reason for this? Next, we will examine how the programmer and compiler can further reduce the CPI figure. Co-operation between the compiler and pipeline Examine the following program: ADDI R2,R0,#200 ;R2 <- 200 LOOP: LW R3,28(R2) ;R3 <- M[28+R2] ADD R1,R3,R1 ;R1 <- R3+R1 SUBI R2,R2,#4 ;R2 <- R2-4 BNEZ R2,LOOP ;if R2<>0 then BRANCH to LOOP The program is in file exempel6.s. Load and run the program.

15 What is the CPI value? As the number of instructions in a program increases, the effect of the clock cycles needed in filling the pipeline reduces to be insignificant. Thus, the CPI value can be approximated by dividing the number of clock cycles it takes to execute one loop iteration by the number of instructions in the loop. Calculate an approximation for the CPI figure. In addition to the result, indicate also the formula you used to achieve the result. Static instruction scheduling Delayed load What does the delayed load technique imply? Consider the issue from the viewpoints of the programmer/compiler and the hardware. Examine the following program: ADDI R2,R0,#200 ;R2 <- 200 LOOP: LW R3,28(R2) ;R3 <- M[28+R2] SUBI R2,R2,#4 ;R2 <- R2-4 ADD R1,R3,R1 ;R1 <- R3+R1 BNEZ R2,LOOP ;if R2<>0 then BRANCH to LOOP How does this program differ from the previous one (exempel6.s)? The program is in file exempel7.s. Load and run the program. (DLX, Pipelining, Delayed Load, No Delayed Branch). What is the CPI value? 14/18

16 Compare the CPI figure and the portion of the branch instructions to the total amount of instructions. Why is the CPI value now better than with the previous program? How much faster was the execution compared to the previous code (exempel6.s)? Delayed Branch What does the delayed branch technique imply? Consider the issue from the viewpoints of the programmer/compiler and the hardware. Examine the following program: ADDI R2,R0,#200 ;R2 <- 200 LOOP: LW R3,28(R2) ;R3 <- M[28+R2] SUBI R2,R2,#4 ;R2 <- R2-4 BNEZ R2,LOOP ;if R2<>0 then BRANCH to LOOP ADD R1,R3,R1 ;R1 <- R3+R1 How does this program differ from the previous one (exempel7.s)? Activate the option Delayed Branch. The program is in file exempel8.s. Load and run the program. (DLX, Pipelining, Delayed Load, Delayed Branch). What is the CPI value now? 15/18

17 16/18 The compiler cannot always place a useful instruction next to a branch instruction. In these cases, an instruction that does not affect the calculation must be used. Such instruction is NOP (No Operation). An example of these situations is shown in the following program: ADDI R2,R0,#200 ;R2 <- 200 ADD R1,R0,R0 ;R1 <- 0 LOOP: ADD R1,R1,R2 ;R1 <- R1+R2 SUBI R2,R2,#1 ;R2 <- R2-1 BNEZ R2,LOOP ;if R2<>0 then BRANCH to LOOP Here, the compiler cannot schedule any of the instructions to be after the branch instruction. Thus, a NOP instruction must be added. ADDI R2,R0,#200 ;R2 <- 200 ADD R1,R0,R0 ;R1 <- 0 LOOP: ADD R1,R1,R2 ;R1 <- R1+R2 SUBI R2,R2,#1 ;R2 <- R2-1 BNEZ R2,LOOP ;if R2<>0 then BRANCH to LOOP NOP Calculate the CPI figure for the above rogram. In addition to the result, indicate also the formula you used to achieve the result. The program is in file exempel9.s. Load and run the program. (DLX, Pipelining, Delayed Load, Delayed Branch). Is the resulting CPI figure the same as what you calculated? The NOP instruction can be eliminated by using more sophisticated methods. The following program performs the same function as the previous code, but without the NOP instruction. ADDI R2,R0,#200 ;R2 <- 200 ADD R1,R0,R0 ;R1 <- 0 ADD R1,R1,R2 ;R1 <- R1+R2 LOOP: SUBI R2,R2,#1 ;R2 <- R2-1 BNEZ R2,LOOP ;if R2<>0 then BRANCH to LOOP ADD R1,R1,R2 ;R1 <- R1+R2 SUB R1,R1,R2 ;R1 <- R1-R2 Two instruction had to be added into the for the program to yield correct results. Mark these instructions into the above code. The CPI figure is now reduced to one. The penalty of this modification is the increase of the number of instructions by two. Nevertheless, as loops are usually executed more than two times, the benefits far outweigh the drawbacks. However, not all compilers are able to do this kind of sophisticated analysis.

18 The program is in file exempel10.s. Load and run the program. (DLX, Pipelining, Delayed Load, Delayed Branch). What is the CPI figure now? Compare the CPI value and the execution time to the previous program (exempel9.s). Examine the following program: ADDI R1,R0,#16 ;R1 <- 16 ADDUI R2,R0,#4777 ;R2 < ADDUI R3,R0,#1326 ;R3 < ADDI R5,R0,#0 ;R5 <- 0 ADDUI R6,R0,#32768 ;R6 < = FRANK: AND R4,R2,R6 ;R4 <- R2 AND R6 SLLI R2,R2,#1 ;R2 <- R2<<1 SLLI R5,R5,#1 ;R5 <- R5<<1 SUBI R1,R1,#1 ;R1 <- R1-1 BNEZ R4,ZED ;if R4<>0 then BRANCH to ZED BNEZ R1,FRANK ;if R1<>0 then BRANCH to FRANK J END ;BRANCH to END ZED: ADD R5,R5,R3 ;R5 <- R5+R3 BNEZ R1,FRANK ;if R1<>0 then BRANCH to FRANK END: TRAP 0 ;end program What does this program do? If your answer does not easily fit on the following line, it is wrong. The execution time of the program depends on the initial value of R2 (4777 in the example code). With what R2 value the execution time is maximized and minimized? Max: R2 = 16 = 10 Min: R2 = 16 = 10 The program is in file zorbas1.s. Load and run the program. (DLX, Pipelining, Delayed Load, No Delayed Branch). Write the results given by the simulator on the next line. Elapsed Cycles = CPI = Hazards = R5 = 16 Modify the program so that all control hazards are eliminated. The execution time and CPI figure must also be improved. Use Delayed Branch. You can re-order, add, and remove instructions. NOP instruction or its look-alikes, such as writing into a register that is never read, must not be used. Correct functioning of the program can be tested by comparing the final value of R5 to the value producad by the example code (zorbas1.s). The final values of the other registers are largely irrelevant. 17/18

19 Write the simulation results of the modified code on the next line. Elapsed Cycles = CPI = Hazards = R5 = 16 Write the code on the next lines. The instructions preceding the loops do not have to be written, unless they have been modified. 18/18

EXERCISE 3: DLX II - Control

TAMPERE UNIVERSITY OF TECHNOLOGY Institute of Digital and Computer Systems TKT-3200 Computer Architectures I EXERCISE 3: DLX II - Control.. 2007 Group Name Email Student nr. DLX-CONTROL The meaning of