Intel s IA-32 Architecture Cptr280 Dr Curtis Nelson History of the Intel 80x86 1971 - Intel invents the microprocessor, the 4004 1975-8080 introduced 8-bit microprocessor 1978-8086 introduced 16 bit microprocessor 1980 - IBM selects 8088 as basis for IBM PC 1980-8087 floating point coprocessor 8088 is 8-bit external bus version of 8086 Adds 60 floating point instructions 80-bit floating point registers Uses hybrid stack/register scheme 1
History of the Intel 80x86 1982-80286 introduced 24-bit address space Memory mapping and protection 1985-80386 introduced 32-bit address space 32-bit General Purpose registers New addressing modes 1989-80486 introduced 1992 - Pentium introduced 1995 - Pentium Pro introduced 1996 - Pentium with MMX (multimedia) extensions 57 new instructions Primarily for multimedia applications 1997 - Pentium II (Pentium Pro with MMX) History of the Intel 80x86 1999 - Pentium III Introduced Supports Intel s Internet Streaming SIMD technology Additional multimedia instructions Four 32-bit floating point operations in parallel Useful in speech recognition, video encoding/decoding 2000 - Itanium introduced Release of IA-64 (RISC-like) architecture Explicitly Parallel Instruction Computing (EPIC) 128-bit bundle with three instructions 128 general purpose registers and 128 floating point registers Done by a partnership between HP and Intel Able to run both UNIX and Microsoft windows Intel s architecture was due to the desire for backward compatibility Highly irregular architecture Over 50 million sold per year 2
History of the Intel 80x86 2003 - AMD extends the architecture Increases address space to 64 bits Widens all registers to 64 bits 2004 - Intel capitulates Embraces AMD64 (calls it EM64T) Adds more multi-media extensions Conclusion (Hennessy and Patterson): Intel s processor development history illustrates the impact of the golden handcuffs of compatibility Adding new features as someone might add clothing to a packed bag An architecture that is difficult to explain and impossible to love IA-32 Overview Complexity: Instructions from 1 to 17 bytes long One operand must act as both a source and destination One operand can come from memory Complex addressing modes Saving grace: The most frequently used instructions are not too difficult to build Compilers avoid the portions of the architecture that are slow 3
IA-32 s and Data Addressing s in the 32-bit subset that originated with 80386 Name 31 EAX ECX EDX EBX ESP EBP ESI EDI Use 0 GPR 0 GPR 1 GPR 2 GPR 3 GPR 4 GPR 5 GPR 6 GPR 7 CS SS DS ES FS GS Code segment pointer Stack segment pointer (top of stack) Data segment pointer 0 Data segment pointer 1 Data segment pointer 2 Data segment pointer 3 EIP EFLAGS Instruction pointer (PC) Condition codes IA-32 Instruction Formats Typical formats: (notice the different lengths) a. JE EIP + displacement 4 4 8 JE Condition Displacement b. CALL 8 32 CALL Offset c. MOV EBX, [EDI + 45] 6 1 1 8 8 MOV d w r/m Postbyte Displacement d. PUSH ESI 5 3 PUSH Reg e. ADD EAX, #6765 4 3 1 32 ADD Reg w Immediate f. TEST EDX, #42 7 1 8 32 TEST w Postbyte Immediate 4
X86 Operand Types x86 instructions typically have two operands, where one operand is both a source and a destination operand Possible combinations include Source/destination type Second source type Memory Memory Immediate Memory Immediate No memory-memory or immediate-immediate Immediates can be 8, 16, or 32 bits long 80x86 Instructions Data movement (move, push, pop) Arithmetic and logic (logic ops, tests CCs, shifts, integer and decimal arithmetic) Control flow (branches, jumps, calls, returns) String instructions (move and compare) FP data movement (load, load constant, store) Arithmetic instructions (add, subtract, multiply, divide, square root, absolute value) Comparisons (can send result to ALU) Transcendental functions (sin, cos, log, etc.) 5
Top 10 80x86 Instructions Rank instruction Integer Average Percent total executed 1 load 22% 2 conditional branch 2 3 compare 16% 4 store 12% 5 add 8% 6 and 6% 7 sub 5% 8 move register-register 4% 9 call 1% 10 return 1% Total 96% Addressing Modes The x86 offers several different addressing modes for accessing memory indirect Base with displacement (8, 16, or 32-bit displacement) Base plus scaled index (8, 16, or 32-bit displacement) Base plus scaled index with displacement (8, 16, or 32-bit displacement) Address in register (mem[r1]) Address in base register plus displacement (mem[r1+100]) Address is Base + 2 scale x Index scale = 0, 1, 2 or 3 Address is Base + 2 scale x Index + disp. scale = 0, 1, 2 or 3 6
80x86 Instruction Format Instructions vary from 1 to 17 bytes in length 80x86 Length Distribution 11 Length in bytes 10 9 8 7 6 5 4 3 2 1 1% 1% 2% 2% 4% 3% 4% 6% 3% 4% 1% 3% 27 % 13 % 12 % 13 % 15 % 12 % 29 % 27 % 16 % 21 % 24 % 24 % 17 % 23 % 25 % 24 % 19 % 24 % Espresso Gcc Spice NASA7 10 % 20 % 30 % % instructions at each length 7
Pentium Pro vs. MIPS R10000 Benchmark Pro MIPS MIPS Pro SPECint95 8.7 8.9 1.02 SPECfp95 6.0 17.2 2.87 The Pentium Pro and MIPS R1000 have comparable performance on integer computations The MIPS R10000 has much better performance than the Pentium Pro for floating point computations Summary Instruction complexity is only one variable Lower instruction count vs. higher CPI vs. lower clock rate Design principles Simplicity favors regularity Smaller is faster Good design demands compromise Make the common case fast 8