3 RISC Principles In the last chapter, we presented many details on the processor design space as well as the CISC and RISC architectures. It is time we consolidated our discussion to give details of RISC principles. That s what we do in this chapter. We describe the historical reasons for designing CISC processors. Then we identify the reasons for the popularity of RISC designs. We end our discussion with a list of the principal characteristics of RISC designs. Introduction The dominant architecture in the PC market, the Intel IA-32, belongs to the Complex Instruction Set Computer (CISC) design. The obvious reason for this classification is the complex nature of its Instruction Set Architecture (ISA). The motivation for designing such complex instruction sets is to provide an instruction set that closely supports the operations and data structures used by Higher-Level Languages (HLLs). However, the side effects of this design effort are far too serious to ignore. The decision of CISC processor designers to provide a variety of addressing modes leads to variable-length instructions. For example, instruction length increases if an operand is in memory as opposed to in a register. This is because we have to specify the memory address as part of instruction encoding, which takes many more bits. This complicates instruction decoding and scheduling. The side effect of providing a wide range of instruction types is that the number of clocks required to execute instructions varies widely. This again leads to problems in instruction scheduling and pipelining. For these and other reasons, in the early 1980s designers started looking at simple ISAs. Because these ISAs tend to produce instruction sets with far fewer instructions, they coined the term Reduced Instruction Set Computer (RISC). Even though the main goal was not to reduce the number of instructions, but the complexity, the term has stuck. There is no precise definition of what constitutes a RISC design. However, we can identify certain characteristics that are present in most RISC systems. We identify these RISC design principles after looking at why the designers took the route of CISC in the 39
40 Guide to RISC Processors first place. Because CISC and RISC have their advantages and disadvantages, modern processors take features from both classes. For example, the PowerPC, which follows the RISC philosophy, has quite a few complex instructions. Evolution of CISC Processors The evolution of CISC designs can be attributed to the desire of early designers to efficiently use two of the most expensive resources, memory and processor, in a computer system. In the early days of computing, memory was very expensive and small in capacity. This forced the designers to devise high-density code: that is, each instruction should do more work so that the total program size could be reduced. Because instructions are implemented in hardware, this goal could not be achieved until the late 1950s due to implementation complexity. The introduction of microprogramming facilitated cost-effective implementation of complex instructions by using microcode. Microprogramming has not only aided in implementing complex instructions, it has also provided some additional advantages. Microprogrammed control units use small fast memories to hold the microcode, therefore the impact of memory access latency on performance could be reduced. Microprogramming also facilitates development of low-cost members of a processor family by simply changing the microcode. Another advantage of implementing complex instructions in microcode is that the instructions can be tailored to high-level language constructs such as while loops. For example, the loop instruction of the IA-32 can be used to implement for loops. Similarly, memory block copying can be done by using its string instructions. Thus, by using these complex instructions, we close the semantic gap between HLLs and machine languages. So far, we have concentrated on the memory resource. In the early days, effective processor utilization was also important. High code density also helps improve execution efficiency. As an example, consider the VAX-11/780, the ultimate CISC processor. It was introduced in 1978 and supported 22 addressing modes as opposed to 11 on the Intel 486 that was introduced more than a decade later. The VAX instruction size can range from 2 to 57 bytes, as shown in Table 3.1. To illustrate how code density affects execution efficiency, consider the autoincrement addressing mode of the VAX processor. In this addressing mode, a single instruction can read data from memory, add contents of a register to it, write back the result to memory, and increment the memory pointer. Actions of this instruction are summarized below: (R2) = (R2)+ R3; R2 = R2+1 In this example, the R2 register holds the memory pointer. To implement this CISC instruction, we need four RISC instructions:
Chapter 3 RISC Principles 41 Table 3.1 Characteristics of some CISC and RISC processors CISC RISC Characteristic VAX 11/780 Intel 486 MIPS R4000 Number of instructions 303 235 94 Addressing modes 22 11 1 Instruction size (bytes) 2 57 1 12 4 Number of general-purpose registers 16 8 32 R4 = (R2) R4 = R4+R3 (R2) = R4 R2 = R2+1 ; load memory contents ; add contents of R3 ; store the result ; increment memory address The CISC instruction, in general, executes faster than the four RISC instructions. That, of course, was the reason for designing complex instructions in the first place. However, execution of a single instruction is not the only measure of performance. In fact, we should consider the overall system performance. Why RISC? Designers make choices based on the available technology. As the technology both hardware and software evolves, design choices also evolve. Furthermore, as we get more experience in designing processors, we can design better systems. The RISC proposal was a response to the changing technology and the accumulation of knowledge from the CISC designs. CISC processors were designed to simplify compilers and to improve performance under constraints such as small and slow memories. The rest of the section identifies some of the important observations that motivated designers to consider alternatives to CISC designs. Simple Instructions The designers of CISC architectures anticipated extensive use of complex instructions because they close the semantic gap. In reality, it turns out that compilers mostly ignore these instructions. Several empirical studies have shown that this is the case. One reason for this is that different high-level languages use different semantics. For example, the semantics of the C for loop is not exactly the same as that in other languages. Thus, compilers tend to synthesize the code using simpler instructions.
42 Guide to RISC Processors Few Data Types CISC ISA tends to support a variety of data structures, from simple data types such as integers and characters to complex data structures such as records and structures. Empirical data suggest that complex data structures are used relatively infrequently. Thus, it is beneficial to design a system that supports a few simple data types efficiently and from which the missing complex data types can be synthesized. Simple Addressing Modes CISC designs provide a large number of addressing modes. The main motivations are (i) to support complex data structures and (ii) to provide flexibility to access operands. Although this allows flexibility, it also introduces problems. First, it causes variable instruction execution times, depending on the location of the operands. Second, it leads to variable-length instructions. For example, the IA-32 instruction length can range from 1 to 12 bytes. Variable instruction lengths lead to inefficient instruction decoding and scheduling. Large Register Set Several researchers have studied the characteristics of procedure calls in HLLs. We quote two studies one by Patterson and Sequin [22] and the other by Tanenbaum [28]. Several other studies, in fact, support the findings of these two studies. Patterson and Sequin s study of C and Pascal programs found that procedure call/return constitutes about 12 to 15% of HLL statements. As a percentage of the total machine language instructions, call/return instructions are about 31 to 33%. More interesting is the fact that call/return generates nearly half (about 45%) of all memory references. This is understandable as procedure call/return instructions use memory to store activation records. An activation record consists of parameters, local variables, and return values. In the IA-32, for example, the stack is extensively used for these activities. This explains why procedure call/return activities account for a large number of memory references. Thus, it is worth providing efficient support for procedure calls and returns. In another study, Tanenbaum [28] found that only 1.25% of the called procedures had more than six arguments. Furthermore, more than 93% of them had less than six local scalar variables. These figures, supported by other studies, suggest that the activation record is not large. If we provide a large register set, we can avoid memory references for most procedure calls and returns. In this context, we note that the eight general-purpose registers available in IA-32 processors are a limiting factor in providing such support. The Itanium, for example, provides a large register set (128 registers), and most procedure calls on the Itanium can completely avoid accessing memory.
Chapter 3 RISC Principles 43 RISC Design Principles The best way to understand RISC is to treat it as a concept to design processors. Although initial RISC processors had fewer instructions compared to their CISC counterparts, the new generation of RISC processors has hundreds of instructions, some of which are as complex as the CISC instructions. It could be argued that such systems are really hybrids of CISC and RISC. In any case, there are certain principles that most RISC designs follow. We identify the important ones in this section. Simple Operations The objective is to design simple instructions so that each can execute in one cycle. This property simplifies processor design. Note that a cycle is defined as the time required to fetch two operands from registers, perform an operation, and store the result in a register. The advantage of simple instructions is that there is no need for microcode and operations can be hardwired. If we design the cache subsystem properly to capture these instructions, the overall execution efficiency can be as good as a microcoded CISC machine. Register-to-Register Operations A typical CISC instruction set supports register-to-register operations as well as registerto-memory and memory-to-memory operations. The IA-32, for instance, allows registerto-register as well as register-to-memory operations; it does not allow memory-to-memory operations. The VAX 11/780, on the other hand, allows memory-to-memory operations as well. RISC processors allow only special load and store operations to access memory. The rest of the operations work on a register-to-register basis. This feature simplifies instruction set design as it allows execution of instructions at a one-instruction-per-cycle rate. Restricting operands to registers also simplifies the control unit. Simple Addressing Modes Simple addressing modes allow fast address computation of operands. Because RISC processors employ register-to-register instructions, most instructions use register-based addressing. Only the load and store instructions need a memory-addressing mode. RISC designs provide very few addressing modes: often just one or two. They provide the basic register indirect addressing mode, often allowing a small displacement that is either relative or absolute. Large Register Set RISC processors use register-to-register operations, therefore we need to have a large number of registers. A large register set can provide ample opportunities for the com-
44 Guide to RISC Processors piler to optimize their usage. Another advantage with a large register set is that we can minimize the overhead associated with procedure calls and returns. To speed up procedure calls, we can use registers to store local variables as well as for passing arguments. Local variables are accessed only by the procedure in which they are declared. These variables come into existence at the time of procedure invocation and die when the procedure exits. Fixed-Length, Simple Instruction Format RISC designs use fixed-length instructions. Variable-length instructions can cause implementation and execution inefficiencies. For example, we may not know if there is another word that needs to be fetched until we decode the first word. Along with fixed-length instruction size, RISC designs also use a simple instruction format. The boundaries of various fields in an instruction such as opcode and source operands are fixed. This allows for efficient decoding and scheduling of instructions. For example, both PowerPC and MIPS use six bits for opcode specification. Summary We have introduced important characteristics that differentiate RISC designs from their CISC counterparts. CISC designs provide complex instructions and a large number of addressing modes. The rationale for this complexity is the desire to close the semantic gap that exists between high-level languages and machine languages. In the early days, effective usage of processor and memory resources was important. Complex instructions tend to minimize the memory requirements. Empirical data, however, suggested that compilers do not use these complex instructions; instead, they use simple instructions to synthesize complex instructions. Such observations led designers to take a fresh look at processor design philosophy. RISC principles, based on empirical studies on CISC processors, have been proposed as an alternative to CISC. Most of the current processor designs are based on these RISC principles. The next part looks at several RISC architectures.