Intel Hyper-Threading technology

Similar documents
QuickSpecs. HP NC6170 PCI-X Dual Port 1000SX Gigabit Server Adapter. Overview. Retired

Migrating from E/X5600 Processors

TECHNOLOGY BRIEF. Compaq 8-Way Multiprocessing Architecture EXECUTIVE OVERVIEW CONTENTS

HP D6000 Disk Enclosure Direct Connect Cabling Guide

HP ProLiant DL580 Generation 2 and HP ProLiant ML570 Generation 2 Server Hot-Add Memory. July 2003 (Second Edition) Part Number

Assessing performance in HP LeftHand SANs

QuickSpecs. PCIe Solid State Drives for HP Workstations

HP EVA P6000 Storage performance

HP SAS benchmark performance tests

Best Practices for Setting BIOS Parameters for Performance

HP Power Calculator Utility: a tool for estimating power requirements for HP ProLiant rack-mounted systems

Hyper-Threading Performance with Intel CPUs for Linux SAP Deployment on ProLiant Servers. Session #3798. Hein van den Heuvel

QuickSpecs. HP Serial-ATA (SATA) Hard Drive Option Kits. Overview

QuickSpecs. Integrated NC7782 Gigabit Dual Port PCI-X LOM. Overview

QuickSpecs. What's New New 146GB Pluggable Ultra320 SCSI 15,000 rpm Universal Hard Drive. HP SCSI Ultra320 Hard Drive Option Kits (Servers) Overview

Designing high-availability solutions using HP Integrity Virtual Machines as HP Serviceguard packages

HP NC7771 PCI-X 1000T

QuickSpecs. What's New HP 120GB 1.5Gb/s SATA 5400 rpm SFF HDD. HP Serial-ATA (SATA) Hard Drive Option Kits. Overview

Models Compaq NC3123 Fast Ethernet Adapter PCI, 10/100 WOL B21 Single B22 5-Pack

QuickSpecs. NC7771 PCI-X 1000T Gigabit Server Adapter. HP NC7771 PCI-X 1000T Gigabit Server Adapter. Overview

Best Practices When Deploying Microsoft Windows Server 2008 R2 or Microsoft Windows Server 2008 SP2 on HP ProLiant DL980 G7 Servers

Models Smart Array 6402/128 Controller B21 Smart Array 6404/256 Controller B21

Smart Array Controller technology: drive array expansion and extension technology brief

HP SmartCache technology

Table of Contents. HP A7173A PCI-X Dual Channel Ultra320 SCSI Host Bus Adapter. Performance Paper for HP PA-RISC Servers

Installing Windows Vista TM Business on HP Compaq Business 4400, 6300, 7300, 7400 notebook models

The HP 3PAR Get Virtual Guarantee Program

Performance of Mellanox ConnectX Adapter on Multi-core Architectures Using InfiniBand. Abstract

Introduction...2. Executive summary...2. Test results...3 IOPs...3 Service demand...3 Throughput...4 Scalability...5

HP 3PAR OS MU1 Patch 11

Retired. HP NC7771 PCI-X 1000T Gigabit Server Adapter Overview

HP Advanced Memory Protection technologies

Target Environments The Smart Array 6i Controller offers superior investment protection to the following environments: Non-RAID

Retired. ProLiant iscsi Acceleration Software Pack for Embedded Multifunction Server Adapters Overview

QuickSpecs. What's New 300GB 6G SAS 15K rpm SFF (2.5in) Enterprise Hard Drive. HP SAS Enterprise and SAS Midline Hard Drives.

QuickSpecs. HP StorageWorks ds2110. HP StorageWorks ds2110. Description HP StorageWorks Disk System 2110 (ds2110)

HP ProLiant Essentials RDMA for HP Multifunction Network Adapters User Guide

HP Operations Orchestration

External Media Cards User Guide

HPE Datacenter Care for SAP and SAP HANA Datacenter Care Addendum

Models HP NC320T PCI Express Gigabit Server Adapter B21

Computer Setup (F10) Utility Guide HP Compaq d220 and d230 Microtower

Configuring RAID with HP Z Turbo Drives

QuickSpecs SmartStart TM Release 6.0

Retired. For more information on HP's ProLiant Security Server visit:

HP and CATIA HP Workstations for running Dassault Systèmes CATIA

Business white paper HP and Bentley MicroStation V8i (SELECTseries3) Business white paper. HP and Bentley. MicroStationV8i (SELECTseries3)

Models HP Security Management System XL Appliance with 500-IPS System License

External Devices. User Guide

QuickSpecs. Models HP I/O Accelerator Options. HP PCIe IO Accelerators for ProLiant Servers. Overview

AMD: WebBench Virtualization Performance Study

HP Business Availability Center

QuickSpecs. HP SAS Hard Drives. Overview

QuickSpecs. What's New HP 1.2TB 6G SAS 10K rpm SFF (2.5-inch) SC Enterprise 3yr Warranty Hard Drive

Intelligent Provisioning 3.10 Release Notes

HP AutoPass License Server

HP 3PAR OS MU3 Patch 18 Release Notes

Models HP USB Standard Duty Cash Drawer

HP Dynamic Deduplication achieving a 50:1 ratio

QuickSpecs. Models HP SC11Xe Host Bus Adapter B21. HP SC11Xe Host Bus Adapter. Overview

ProLiant DL F100 Integrated Cluster Solutions and Non-Integrated Cluster Bundle Configurations. Configurations

Overview of HP tiered solutions program for Microsoft Exchange Server 2010

QuickSpecs. What's New. Models. ProLiant Essentials Server Migration Pack - Physical to ProLiant Edition. Overview

TECHNOLOGY BRIEF. Double Data Rate SDRAM: Fast Performance at an Economical Price EXECUTIVE SUMMARY C ONTENTS

Intelligent Provisioning 1.70 Release Notes

Intelligent Provisioning 1.64(B) Release Notes

HPE 3PAR OS MU5 Patch 49 Release Notes

QuickSpecs. HP Serial-ATA (SATA) Entry (ETY) and Midline (MDL) Hard Drive Option Kits Overview

HP Visual Collaboration Room. User Guide

Table of contents. OpenVMS scalability with Oracle Rdb. Scalability achieved through performance tuning.

QuickSpecs. Models. HP NC380T PCI Express Dual Port Multifunction Gigabit Server Adapter. Overview

QuickSpecs. Models. HP NC110T PCI Express Gigabit Server Adapter. Overview. Retired

QuickSpecs. Models. HP StorageWorks 8Gb PCIe FC HBAs Overview

External Devices User Guide

NOTE: For additional information regarding the different features of SATA Entry drives please visit the SATA hard drive website:

HP Velocity User Guide for Thin Clients

HP ALM Client MSI Generator

QuickSpecs. HPE Workload Aware Security for Linux. Overview

HP ProLiant BL35p Server Blade

HPE Moonshot ilo Chassis Management Firmware 1.52 Release Notes

Marvell BIOS Utility User Guide

Internal Cabling Guide for the HP Smart Array 6400 Series Controller on an HP Integrity Server rx7620

HP and NX. Introduction. What type of application is NX?

HP Operations Orchestration

Intelligent Provisioning 3.00 Release Notes

HP StorageWorks 4000/6000/8000 Enterprise Virtual Array connectivity for Sun Solaris installation and reference guide

QuickSpecs. Models. Overview

Newest generation of HP ProLiant DL380 takes #1 position overall on Oracle E-Business Suite Small Model Benchmark

Sales Certifications

Simultaneous Multithreading on Pentium 4

QuickSpecs. Models. HP NC510C PCIe 10 Gigabit Server Adapter. Overview

File Server Comparison: Executive Summary. Microsoft Windows NT Server 4.0 and Novell NetWare 5. Contents

QuickSpecs. Models. HP Smart Array P400i Controller. Overview

NOTE: A minimum of 1 gigabyte (1 GB) of server memory is required per each NC510F adapter. HP NC510F PCIe 10 Gigabit Server Adapter

Deploying Microsoft Exchange Server 2007 mailbox roles on VMware Infrastructure 3 using HP ProLiant servers and HP StorageWorks

Available Packs and Purchase Information

Configuring the HP StorageWorks Modular Smart Array 1000 and 1500cs for external boot with Novell NetWare New Installations

HP StorageWorks Enterprise Virtual Array 4400 to 6400/8400 upgrade assessment

QuickSpecs. HP StorageWorks Command View SDM. Models. Models Feature List

QLE10000 Series Adapter Provides Application Benefits Through I/O Caching

Transcription:

Intel Hyper-Threading technology technology brief Abstract... 2 Introduction... 2 Hyper-Threading... 2 Need for the technology... 2 What is Hyper-Threading?... 3 Inside the technology... 3 Compatibility... 4 Issues with Hyper-Threading... 4 Applications... 4 OS... 5 Licensing... 5 Performance gains... 6 Conclusion... 7 For more information... 8 Call to action... 8

Abstract Intel s introduction of Hyper-Threading technology represents a significant improvement in processor utilization and performance. This technology boosts system performance without going to a higher clock rate or adding more processors. This improvement is achieved by making multiple instruction streams, called threads, internally available to a single processor at the same time. These threads allow the processor the opportunity to better schedule the use of internal resources and improve utilization. HP servers offer this new technology by making use of Intel Xeon processors, which incorporate Hyper-Threading. This technology brief describes the Hyper-Threading concept, as well as its benefits and limitations for the user. Some HP performance test results are also included to show the improvement seen by the use of Hyper-Threading. Introduction Managers of today s enterprise environments continually face the need to lower costs and improve performance. A straightforward means to improve performance has always been to increase processor speeds, but this typically comes at a higher cost. To find a solution that attacks both cost and performance at the same time requires an improvement in the utilization of existing resources. One of Intel s solutions in this regard was to focus on threading at the system level through software. Using parallel threads of instructions has worked well in multiprocessor systems, since different processors can simultaneously operate on different threads. Taking this a step further, Intel s Hyper- Threading solution offers a unique approach by providing thread-level parallelism on each individual processor. This technology brief discusses the need for Hyper-Threading, the basics of the concept, the benefits it provides, issues that have arisen in regard to software licensing, and some results that HP has seen in laboratory tests. Hyper-Threading This new technology from Intel enables multi-threaded applications to execute threads in parallel on each individual processor. Available on Intel Xeon processors, Hyper-Threading provides the user with increased computing power to meet the needs of today s server applications. Need for the technology Improving processor utilization has been an industry goal for years. Processor speeds have advanced until a typical processor today can run at frequencies over 2 gigahertz, but much of the rest of the system is not capable of running at that speed. To enable performance improvements, memory caches have been integrated into the processor to minimize the long delays that can result from accessing main memory. Xeon processors, for example, now include three cache levels on the die. Large server-based applications tend to be memory intensive due to the difficulty of predicting access patterns. The working data sets are also quite large. These two things can create bottlenecks, regardless of memory prefetching techniques. Latency due to these bottlenecks only gets worse when pointer-intensive applications are executed. Any mistake in prediction can force a pipeline to be cleared, incurring a delay to refill this data. It is this latency that drives processor utilization down. Despite improvements in application development and parallel processing implementations, reaching higher utilization rates remained an unmet goal. 2

What is Hyper-Threading? Hyper-Threading Technology enables one physical processor to execute two separate threads at the same time. To achieve this, Intel designed the Xeon processor with the usual processor core, but with two Architectural State devices (see Figure 1). Each Architectural State (AS) tracks the flow of a thread being executed by core resources. After power-up and initialization, these two internal Architectural States define two logical processors. Individually they can be halted, interrupted, or can execute a specific thread independently of the other logical processor. Each AS has an instruction pointer, advanced programmable interrupt controller (APIC) registers, general-purpose registers, and machine state registers. The two logical processors then share the remaining physical execution resources. An application or operating system (OS) can submit threads to two different logical processors just as it would in a traditional multiprocessor system. Figure 1. Conceptual illustration of Hyper-Threading IA-32 processor with Hyper-Thread technology traditional dual-processor (DP) system AS1 AS2 AS AS processor core processor core processor core two logical processors share a single processor core system bus logical processor two separate physical processors system bus Inside the technology Looking inside the processor we find that the core contains subsystems to enhance performance. These subsystems control program execution, perform instruction fetching, integrate the on-die cache, and handle all the instruction reordering and retiring. As threads are passed to the processor, the instruction fetching and reordering systems allocate resources to the incoming threads. The instructions in these threads are then sent to the execution system in an alternating fashion from the level 1 cache. This continues until one of the logical processors no longer needs information from the level 1 cache and then the entire cache resource is allocated to the other logical processor. The execution core processes instructions in an order determined by dependencies in the data and availability. The processor is allowed to execute instructions out of order, that is, in a different order than the order in which they arrived. This means instructions can be executed in the order that will 3

yield the best overall performance. Schedulers inside the execution system handle the mapping and ordering, and they may send multiple instructions from one processor before executing instructions from the other. The cache system provides data to the execution system at high speeds and with larger cache lines than previous processors used. The cache operates at the same speed as the execution core so that future versions of the processor will continue to operate at correspondingly faster rates. The Xeon MP, intended for systems with four or more processors, is equipped with an integrated third-level cache to reduce the competition for shared resources between processors in the same system. Both of the logical processors inside the physical processor share all the internal caches. The cache design implements a high-level of set-associativity to minimize the possibility of the logical processors constantly throwing each other s data out of the cache to make room for their own data. In some cases, one processor may be able to fetch instructions or data into the cache for the benefit of the other processor in order to improve overall execution rates. The instruction reordering and retiring system eventually completes all the out-of-sequence instructions that were executed, and then retires them in the original program order. Upon completion, instructions are retired much the way they were originally sent, with the logical processor taking turns. Compatibility Hyper-Threading on Xeon processors is fully backward compatible with current operating systems and applications. Legacy operating systems with multiprocessor capability can run unmodified on Xeonbased HP systems. Some legacy systems, such as Windows NT, may not recognize the additional logical processors, and therefore cannot take advantage of Hyper-Threading Technology. Operating systems such as Windows 2000 Server, Linux, and Novell NetWare can recognize both of the logical processors and take advantage of Hyper-Threading Technology, depending on OS license configurations. Some of these same operating systems are expected to include optimizations for Hyper-Threading so they can distinguish between logical and alternate processors, enabling them to optimize scheduling and improve idle loops to maximize performance gains. For example, Windows 2003 Server (formerly referred to as Windows.NET) has this capability. A recent paper published by Microsoft on the topic of software compatibility stated that, Although Windows 2000 is compatible with Hyper-Threading Technology, we expect customers will get the best performance from Hyper-Threading Technology using Windows.NET Server. This is because the Windows.NET Server Family is engineered to take full advantage of the logical processors created by Hyper-Threading Technology. Microsoft expects to see positive performance gains with Windows.NET Server and Hyper-Threading Technology, while Windows 2000 performance gains are expected to be more modest. Issues with Hyper-Threading Reducing latency in a system by providing dual paths to an underutilized processor core should improve system performance. Still, there are some potential issues that should be examined, though these issues do not affect everyone. Applications Hyper-Threading Technology can actually produce a performance loss if the load at the logical processors is not balanced. Two logical processors share resources at the execution core and as a result no single processor is able to use all the resources that would normally be available to a single processor that did not implement Hyper-Threading. If one thread of an application were working and the other thread were waiting (spinning), the operating thread would still have less than 100 percent 4

of the resources. An effective load balance for a Hyper-Threading system is imperative to reduce the chances that only one thread will be active. With two logical processors sharing execution resources, the effective size of the cache with which each can operate is approximately half the actual cache size. Applications written for multithreading should therefore expect to have only half the cache available for each thread. When considering code size optimization, for example, excessive loop unrolling should be avoided. Although cache sharing may be an issue for some applications, it does provide better cache locality for other applications. For example, an application might use one logical processor to fetch data into the shared caches to reduce latency for the other logical processor. OS Operating systems use logical processors on a Hyper-Thread processor just as they do any other processor, by scheduling threads to operate on each. For optimal performance in a Hyper-Threading system, the OS should provide these optimizations: Idle-loop and HALT: A logical processor that continually checks to see if work is available will needlessly consume resources. The OS should HALT the inactive processor so that executions resources are freed up for the logical processor operating. Thread scheduling: The OS should allocate threads to one logical processor in each physical location before assigning additional threads to the alternate logical processors. This will allow threads to execute with full resources when they are available. Licensing HP server platforms based on Hyper-Threading have implemented the required BIOS changes to recognize the logical processors so that this information can be passed to the OS or application. During system boot and initialization, the multiprocessor system BIOS records only the first logical processor on each physical processor in the system and records that information in the Multiprocessor Specification (MPS) table to preserve backward compatibility. This aids any legacy OS that only uses the MPS for determining system configuration and is not capable of using the alternate logical processors. Next, the BIOS records each of the alternate logical processors into the Advanced Configuration and Power Interface (ACPI) table. This allows an OS that uses the ACPI table to see and schedule threads for all of the logical processors. It is critical that the BIOS record the first logical processor of each physical processor before recording any alternates. Windows 2000 Server does not distinguish between physical and logical processors on systems enabled with Hyper-Threading Technology; Windows 2000 simply fills out the license limit using the first processors counted by the BIOS. For example, when Windows 2000 Server (4-CPU limit) is launched on a four-way system enabled with Hyper-Threading Technology, Windows will use the first logical processor on each of the four physical processors; the second logical processor on each physical processor will remain unused, because of the 4-CPU license limit. (This statement assumes that the BIOS was written according to Intel specifications. Windows uses the processor count and sequence indicated by the BIOS.) However, when Windows 2000 Advanced Server (8-CPU limit) is launched on a four-way system enabled with Hyper-Threading Technology, the OS will use all eight logical processors. Although the OS will recognize all eight logical processors in this example, in most cases performance would be better using eight physical processors. When examining the processor count provided by the BIOS, Windows 2003 Server distinguishes between logical and physical processors, regardless of how they are counted by the BIOS. This 5

provides a powerful advantage over Windows 2000, in that Windows 2003 Server counts only physical processors against the license limit. For example, if Windows 2003 Standard Server (2-CPU limit) is launched on a two-processor system enabled with Hyper-Threading Technology, the OS will use all four logical processors. 1 Performance gains Intel Hyper-Threading Technology enables simultaneous multi-threading at the processor level. In the current implementation the two logical processors on each physical processor share most execution resources but maintain separate architectural states. Tests run thus far in HP labs substantiate the expected performance gains for Hyper-Threading (Figures 2 and 3). For the standard workload of WebBench, the Hyper-Threading Technology showed a peak performance advantage of 12 percent in a dual processor server and a 44 percent performance advantage with a single processor. The differences observed in these cases are due entirely to the Hyper-Threading Technology, since the processors and systems were otherwise identical. Figure 2. WebBench test results as a function of the number of physical clients (engines) hp confidential - preliminary 7000 hp ProLiant DL380 G3 WebBench TM 5.0 - Ecommerce_cgi_win2k Script 6000 requests 5000 per second 4000 3000 2000 1000 0 1 2x2.8GHz / HT On 2x2.8GHz / HT Off 1x2.8GHz / HT On 1x2.8GHz / HT Off 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 number of engines 1 Microsoft Windows-Based Servers and Intel Hyper-Threading Technology, John Borozan, Microsoft Corporation, February 2002 6

Figure 3. WebBench test results as a function of the number of processors hp confidential - preliminary 7000 hp ProLiant DL380 G3 WebBench TM 5.0 - Ecommerce_cgi_win2k Script 6000 Hyper-Threading On Hyper-Threading Off 5000 requests per second 4000 3000 2000 1000 0 2 processors 1 processor Performance results depend on many factors, of course, including installed memory, the application in use and the memory footprint it requires, as well as the number of simulated clients. Other tests results that were not ready in time for publication of this paper indicated that the performance gain was smaller in the on-line transaction processing application space, where the observed performance delta was between 5 and 14 percent. Conclusion Tests performed in HP labs indicate that Hyper-Threading fulfills much of its promise, as substantial performance gains were seen when using Xeon processors that incorporate Hyper-Threading. The gains reported from using Hyper-Threading ranged from as little as 5 percent in a multi-processor OLTP test to as high as 44 percent in a single-processor system running the WebBench benchmark test. This wide range emphasizes that performance is highly dependent on the type of application and other factors. As with any processor metric, actual performance benefits in the field will vary with system implementation choices such as installed memory, cache size, application type, and the memory footprint used by that application. 7

For more information For additional information, refer to the HP ProLiant server website at www.hp.com/go/proliant. Call to action Send comments about this paper to: TechCom@HP.com. 2003 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. Microsoft, Windows, and Windows NT are U.S. registered trademarks of Microsoft Corporation. TC030306TB, 03/2003