What Transitioning from 32-bit to 64-bit x86 Computing Means Today Chris Wanner Senior Architect, Industry Standard Servers Hewlett-Packard 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
Agenda What and Why of 64bit computing Intel EM64T vs. AMD64 X86 64bit extensions vs. Itanium 2 Transition to 64bit computing 2
64bit processors X86-64 bit Extensions IPF POWER x86-64 extensions brings 64bit computing to the volume/mainstream industry standard market Power 3 Merced Power 4 McKinley Opteron Xeon Madison Power 5 PA-RISC 8000 8500 8700 8800 SPARC Ultra SPARC Ultra SPARC II Ultra SPARC III Ultra SPARC IV ALPHA EV4 EV5 EV6 EV7 MIPS R4K R8K R10K R12K R14K R16K 1990 2000 3
What and Why of 64bit computing? Its about: - Data handling - Memory addressability
Data handling Registers Datapaths Arithmetic units What size chunks can we use to move and manipulate data What is the benefit of being able to use larger chunks of data? Higher performance Greater accuracy 64bit arithmetic vs. 32bit 64bit logical operations vs. 32bit 64bit floating point operations vs. 32bit 5
Data handling register size 64 18+ yrs 7 yrs 32 16 8 4 4 yrs 3 yrs 32-bit computing fueled the growth of the Industry Standard Server market 64-bit computing will continue to feed the need for higher levels of performance 1970 1980 1990 2000 6
Data handling - register size But. Tempered by the reality that 32bit processors in Industry Standard Servers already can move and compute data in chunks larger than 32bits: Cache line size is 512-bits 64-bit front side bus 64-bit, 128-bit, and even 256-bit internal datapaths 80bit FPUs, 64bit MMX, and 128bit XMM SIMD floating point and Integer operations (SSE2) There would be little need for a true 64bit processor if data size was the only reason. 7
Data handling register quantity Its not just about the width of registers, its also about quantity of registers: 64bit processors typically have more registers than 32-bit processors More registers can equal more performance Registers are faster than cache or memory More registers = more data can be held close to the CPU core and used without incurring CPU idles Ex. IPF = 128 General Purpose registers vs. 8 GPR for IA-32 8
Data handling - register quantity But. Even though the basic IA-32 ISA only specifies 8 GPRs, additional but specific registers are available with x87, MMX, and SSE extensions So there must be more still. 9
What and Why of 64bit computing? Its about: - data handling - Memory addressability
Memory Addressability How much memory a CPU can access is dependent on the bit-ness of the CPU: Address range = 2 bit-ness Thus: 2 16 = 64KB 2 32 = 4 GB 2 64 = 16 Exabytes 32-bit processors 64-bit processors 32bit address range 64bit address range 4,000,000,000 times 11
How important is a larger address space? No one will need more than 64K of memory Urban Legend quote attributed to Bill Gates
Addressability over time 1TB 4GB 1GB 1MB 64k 1K 1 3 yrs 4 yrs 7 yrs 18+ yrs 1970 1980 1990 2000 13
Who needs more than 4GB of memory? A: increasingly more applications are requiring more than 4GB of memory
Memory addressability Consider: Currently 4GB address space is shared between OS kernel, library routines, and applications Applications get only 2GB 3GB of space Server consolidation solutions where a number of applications are sharing the available memory space Consolidation solutions are becoming more prevalent across the industry Greater CPU power Need to reduce TCO Virtual address space may be even more important than physical Database applications that can store more data in memory rather than on disk decreases database delays by orders of magnitude Email applications where each user supported requires memory resources More memory = more supported users 15
Memory Addressability These and many other solutions can benefit from larger address space and thus: More memory = more performance More memory = more capabilities More memory = more reliability and availability These are not new concepts to computing, But x86 64bit extensions moves new capabilities into the the volume industry standard computing space 16
What took so long?
Memory capacity and pricing trends >4GB capacities in a typical Industry Standard Server has not been practical during the past 10 years 18 16 14 12 10 8 6 4 2 0 Not practical Practical 1994 1996 1998 2000 2002 $100,000 Economical $10,000 $1,000 Expense of >4GB has not been economical until recently Not economical $100 1994 1996 1998 2000 2002 18
Memory barriers removed So at this time it is both practical and economical to have large memory capacities in volume servers thus making 64bit computing ala x86 64bit extensions viable and important 19
x86 64 bit Extensions Questions?
64bit Extensions Architectures What? Intel: EM64T (Extended Memory 64bit technology) AMD: AMD64 Microsoft: X64 extensions) (AMD s x86-64bit technology) (Microsoft s term for x86 64bit 21
64bit extensions registers & instructions
x86 to x86-extensions - registers SSE & SSE2 GPR X87/MMX 127 0 XMM0...... XMM7 XMM8...... XMM15 63 RAX 31 R8 R15 EAX EBX ECX EDX ESP EBP ESI EDI 15 7 0 ah bx cx dx sp bp si di al 79 MMX0/FPR0...... MMX7/FPR7 0 Program Counter 63 31 15 0 EIP ip 64bit extensions is the latest in a series of changes to the x86 architecture that has been occurring over the last 20+ years 23
x86 extensions 10 new instructions Instruction AMD Intel Notes CDQE Supported Supported New mnemonic for existing opcode CMPSQ Supported Supported New mnemonic for existing opcode LODSQ Supported Supported New mnemonic for existing opcode MOVSQ Supported Supported New mnemonic for existing opcode STOSQ Supported Supported New mnemonic for existing opcode MOVZX Supported Supported 64-bit version of existing instruction SYSCALL Supported in all modes 64-bit mode only New for Intel in 64bit mode only SYSRET Supported in all modes 64-bit mode only New for Intel in 64bit mode only CMPXCHG16B Not supported Supported 8-byte only version in AMD64 SWAPGS Supported Supported New Minor differences in the implementations of 64bit extensions is expected to be handled by compilers and OS s transparent to the end user Different platforms but single binary 24
32bit and 64bit modes legacy Mode Long Mode Legacy Compatibility Native 64-bit User Application 32 bit 32 bit 64 bit Kernel Operating System 32 bit Thunking* 64 bit 64 bit Drivers 32 bit 64 bit 64 bit * Windows - Thunking/DLL Linux - System call emulation Existing SW infrastructure Allows users to move to 64-bit without giving up 32-bit compatibility or performance Full 64bit environment 25
Ecosystem Support for x86 64bit Extensions OS & Applications
OS and Applications Transition from x86 16bit to 32bit: 82386 Release > 8 years Windows NT 3.1 Windows 95 Transition from x86 32bit to 64bit: Opteron/AMD64 <1 year 2 years SuSE/SLES8 Redhat EL3 Microsoft x86 OS 64bit OS support significantly faster than last major transition 27
OS Support Linux Products 32-bit x86 IPF 64-bit X86-64 64 Redhat Enterprise Linux 3 SuSE Linux Enterprise Server 9 Microsoft Products Windows XP 64-bit Edition Windows Server 2003 Web Edition Windows Server 2003 Standard Edition Windows Server 2003 Enterprise Edition Windows Server 2003 Datacenter Edition Available now Expected release 1H05 28
Application support 350 300 250 200 150 100 50 0 AMD64 In development Linux OSs EM64T Shipped Q1'03 Q2'03 Q3'03 Q4'03 Q1'04 Q2'04 Q3'04 Development tools e.g. GNU & C++ compilers, debuggers, profilers, libraries Database engines e.g. SQL, Oracle 8i,9i, MySQL Infrastructure applications e.g. VMware, Zeus web server,.net environment Vertical applications -.e.g. Synopsys, Cadence, Fluent, Matlab 29
X86 64bit extensions vs. Itanium 2 Architecturally significant differences Instruction set significant differences positioning significant differences
Xeon/Opteron compared to Itanium 2 Xeon / Opteron 3 Integer 1 TB 6.4 GB/s 20 GB/s 1MB 4MB 12 31 1 2 3 4 5 6 40 Registers Fmisc, Fmul,Fadd 1 for SIMD 2 Load or 2 Store 2.2 GHz, 3.2+GHz 3 Instructions / Cycle Memory Addressing System Bus Bandwidth On-die Cache Pipeline Stages Issue Ports On-die Registers Execution Units Core Frequency Instructions / Clk Itanium 2 Processor 1024 TB 6.4 GB/s 6 MB 8 1 2 3 4 5 6 7 8 9 1011 264 Application Registers + 64 Predicate Registers* 6 Integer, 3 Branch 2 FP (FMAC) 1 SIMD 2 Load and 2 Store 1.5 GHz 6 Instructions / Cycle 31
Positioning x86 64bit extensions vs. IPF Integrity & NonStop servers HPC Large SMP, large memory ProLiant ProLiant & Integrity Integrity Integrity & NonStop Mix of ProLiant, Integrity & NonStop ProLiant & Integrity systems Web Mail Infrastructure Services, caching, proxy Messaging HPC BI Directory, DNS, firewall, security Work group BI Biz intelligence/ SCM planning OLTP med App tier Biz intelligence Very large data sets ERP medium OLTP large ERP large For customers who need the highest levels of performance and scalability for the most demanding applications and enterprise environments, Itanium architecture and HP Integrity servers are the solutions of choice 1-4 processors 4-8 processors 8-64+ processors 32
Positioning continued Breadth of Applications 32-bit x86 X86 64 64-bit IPF Scalability 33
Transitioning to 64bits
32 bit to 64 bit transitioning Lessons learn with Itanium: - some applications port extremely well - others are a huge burden - esp. 16bit code - assembly code - be judicious about what to port and what not to port - some applications benefit from 64bit - others run slower in 64bit mode - 64bit extensions gives you the flexibility to port only those applications that make sense to port and the rest can stay 32bits!!! 35
What applications should port to x86-64? Database: Many database apps are memory bound within a 32-bit environment and benefit greatly from larger physical address space Possibly even run entire database out of memory rather than from disk email: Larger address space allows the server to support a much larger number of users per server Fewer servers / lower TCO Terminal Server: Avoiding kernel address space limitations when hosting multiple applications Ex. Microsoft Office hosting on Terminal Server in a 64bit environment can support 50% more users vs. 32bit environment 36
What applications should port to x86-64? Business Apps: Apps that have high memory requirements Apps that have high computational requirements Technical / Scientific computing: Need for a large virtual and physical address space Complex computations These requirements are valid for porting to IPF 64-bits also, it s a matter of degree: - low/med requirements = x86 64-bit extensions - high requirements = Itanium 2 processor 37
Co-produced by:
Backup Opteron Ecosystem support 39