i960 Microprocessor Performance Brief October 1998 Order Number:

Similar documents
Software Evaluation Guide for ImTOO* YouTube* to ipod* Converter Downloading YouTube videos to your ipod

Intel Desktop Board DZ68DB

Intel Desktop Board D975XBX2

Intel Cache Acceleration Software for Windows* Workstation

Software Evaluation Guide for WinZip 15.5*

Intel Desktop Board D945GCCR

Intel Desktop Board D946GZAB

Non-Volatile Memory Cache Enhancements: Turbo-Charging Client Platform Performance

Software Evaluation Guide for CyberLink MediaEspresso *

Intel Desktop Board DG41CN

Intel Desktop Board D945GCLF2

Intel Desktop Board D945GCLF

Upgrading Intel Server Board Set SE8500HW4 to Support Intel Xeon Processors 7000 Sequence

Software Evaluation Guide for WinZip* esources-performance-documents.html

Intel Desktop Board DG31PR

INTEL PERCEPTUAL COMPUTING SDK. How To Use the Privacy Notification Tool

Enabling DDR2 16-Bit Mode on Intel IXP43X Product Line of Network Processors

Intel Desktop Board DP55SB

Software Evaluation Guide for Photodex* ProShow Gold* 3.2

Intel Cache Acceleration Software - Workstation

Intel Desktop Board DG41RQ

Evolving Small Cells. Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure)

Bitonic Sorting. Intel SDK for OpenCL* Applications Sample Documentation. Copyright Intel Corporation. All Rights Reserved

LED Manager for Intel NUC

Software Evaluation Guide for Sony Vegas Pro 8.0b* Blu-ray Disc Image Creation Burning HD video to Blu-ray Disc

Bitonic Sorting Intel OpenCL SDK Sample Documentation

Intel Desktop Board D845PT Specification Update

i960 VH Embedded-PCI Processor

Installation Guide and Release Notes

Intel vpro Technology Virtual Seminar 2010

Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing

Intel Desktop Board DH55TC

Forging a Future in Memory: New Technologies, New Markets, New Applications. Ed Doller Chief Technology Officer

Installation Guide and Release Notes

Intel 848P Chipset. Specification Update. Intel 82848P Memory Controller Hub (MCH) August 2003

Software Evaluation Guide Adobe Premiere Pro CS3 SEG

Intel Desktop Board DH61SA

Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes

Intel 815 Chipset Family: Graphics and Memory Controller Hub (GMCH)

How to Create a.cibd File from Mentor Xpedition for HLDRC

Intel Desktop Board D102GGC2 Specification Update

3 Volt Intel StrataFlash Memory to Motorola MC68060 CPU Design Guide

2013 Intel Corporation

How to Create a.cibd/.cce File from Mentor Xpedition for HLDRC

Designing an ITU G.742 Compliant PDH Multiplexer with the LXT332 Dual Transceiver

Drive Recovery Panel

Intel Desktop Board DP67DE

Intel Atom Processor E6xx Series Embedded Application Power Guideline Addendum January 2012

Intel G31/P31 Express Chipset

Intel Atom Processor E3800 Product Family Development Kit Based on Intel Intelligent System Extended (ISX) Form Factor Reference Design

Intel Desktop Board DH61CR

Intel Extreme Memory Profile (Intel XMP) DDR3 Technology

Intel Desktop Board DQ35JO

Intel Desktop Board DP45SG

StrongARM** SA-110/21285 Evaluation Board

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes

Intel Open Source HD Graphics, Intel Iris Graphics, and Intel Iris Pro Graphics

Intel Desktop Board D915GUX Specification Update

Intel Desktop Board D915GEV Specification Update

StrongARM SA-1100 Development Board Firmware Kit

Optimizing the operations with sparse matrices on Intel architecture

Mobile Client Capability Brief for Exporting Mail in Microsoft* Office* Outlook* 2007

Intel Desktop Board D845HV Specification Update

Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing

Installation Guide and Release Notes

Intel Desktop Board DP43TF

Migrating from a 440ZX-66 AGPset to a 440BX AGPset in a Celeron Processor Design

3 Volt Intel StrataFlash Memory to i960 H CPU Design Guide

Intel Atom Processor D2000 Series and N2000 Series Embedded Application Power Guideline Addendum January 2012

Intel Core TM i7-4702ec Processor for Communications Infrastructure

Intel 945(GM/GME)/915(GM/GME)/ 855(GM/GME)/852(GM/GME) Chipsets VGA Port Always Enabled Hardware Workaround

Considerations When Using the 66 MHz as an Accelerated Graphics Port - Peripheral Component Interconnect Bridge

GUID Partition Table (GPT)

BI440ZX Motherboard Specification Update

IEEE1588 Frequently Asked Questions (FAQs)

Intel Core TM Processor i C Embedded Application Power Guideline Addendum

Migrating from the 8XC251Sx to the 8XC251Tx

Intel Open Source HD Graphics Programmers' Reference Manual (PRM)

Desktop 4th Generation Intel Core, Intel Pentium, and Intel Celeron Processor Families and Intel Xeon Processor E3-1268L v3

Introduction. How it works

82551QM/ER/IT Fast Ethernet PCI Controller MDI-X Functional Overview. Application Note (AP-472)

Software Occlusion Culling

80C186XL/80C188XL EMBEDDED MICROPROCESSORS SPECIFICATION UPDATE

Using Tasking to Scale Game Engine Systems

Intel Integrated Native Developer Experience 2015 Build Edition for OS X* Installation Guide and Release Notes

Intel 852GME/852PM Chipset Graphics and Memory Controller Hub (GMCH)

Device Firmware Update (DFU) for Windows

Intel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes

DRAM Refresh Control with the

Intel Platform Controller Hub EG20T

Intel Desktop Board DQ57TM

Open FCoE for ESX*-based Intel Ethernet Server X520 Family Adapters

Boot Agent Application Notes for BIOS Engineers

Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes

80C31BH, 80C51BH, 80C51BHP, 87C51 SPECIFICATION UPDATE

Intel 6400/6402 Advanced Memory Buffer

Intel USB 3.0 extensible Host Controller Driver

The Intel Processor Diagnostic Tool Release Notes

Transcription:

Performance Brief October 1998 Order Number: 272950-003

Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel s Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The i960 microprocessors may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an ordering number and are referenced in this document, or other Intel literature may be obtained by calling 1-800- 548-4725 or by visiting Intel s website at http://www.intel.com. Copyright Intel Corporation, 1998 *Third-party brands and names are the property of their respective owners. Performance Brief

Contents 1.0 Introduction...5 2.0 i960 Microprocessor REF Product Highlights...7 3.0 Methodology...7 4.0 Benchmark Programs...9 5.0 Development Workstation/Operating System...9 5.1 Compiler/Assembler/Linker...9 5.2 Target Hardware...10 5.3 Memory Interface Configurations...11 6.0 Relative Performance of the 80960 Microprocessor Family...12 6.1 Dhrystone Version 2.1...12 6.2 Dhrystone MIPS...13 6.3 Whetstone Single Precision...14 6.4 Whetstone Double Precision...15 Figures 1 i960 Family Features Comparison...5 2 Relative Performance of i960 Microprocessors (not frequency normalized)...6 3 Dhrystone Version 2.1 Performance...12 4 Dhrystone MIPS Relative Performance...13 5 Whetstone-Single Precision Relative Performance...14 6 Whetstone Double Precision Relative Performance...15 Tables 1 Typical 32-bit Microprocessor Memory Systems...8 2 Benchmark Program Descriptions...9 3 DRAM Access Times...11 Performance Brief 3

1.0 Introduction This report describes the results of performance benchmarks for the Intel i960 microprocessor family. The benchmark results were obtained from Intel s web-based Remote Evaluation Facility (REF) located at the following URL: http://developer.intel.com/design/i960/testcntr/ The REF site allows users to evaluate the i960 processor family using provided benchmarks. Users can also download application-specific benchmark programs. The REF is a protected web site that allows the establishment of specific user accounts. The benchmark results in this report are indicators only and should not be the sole deciding factor in microprocessor selection. The performance benefits described in this report are not guaranteed for any particular implementation of an i960 microprocessor in an embedded application. The results published in this report were obtained using Cyclone evaluation platforms for the i960 processor. The results should not be compared with other published microprocessor performance measurements. Variations in microprocessor platforms can significantly impact the results of performance benchmarks. The i960 microprocessors covered in this report are: 80960CA, 80960CF, 80960JA, 80960JF, 80960JD, 80960JT, 80960HA, and 80960HD. Figure 1 compares the features of the members of the i960 microprocessor family. Figure 1. i960 Family Features Comparison 80960HD 80960HA 80960HT 80960CA Demultiplexed Bus Superscalar Core 4-Channel DMA 1K I-Cache 1K RAM 80960SB 80960SA 80960CF Demultiplexed Bus Socket Compatible Superscalar Core 4-Channel DMA 4K I-Cache 1K D-Cache 80960KB 80960KA 32-Bit Address / Data KB has FPU 512 Byte I-Cache 3x 80960 CF Performance Demultiplexed Bus Superscalar Core 16K I-Cache, 8K D-Cache HD Clock Doubled HT Clock Tripled 80960JA 80960JF 80960JD 80960JT 3-5x 80960 Kx Performance Scalar Core JT 16K I-Cache, 8K D-Cache JF/JD 4K I-Cache, 2K D-Cache JA 2K I-Cache, 1K D-Cache Low Power Features JD Clock Doubled JT Clock Tripled 32-Bit Address / 16-Bit Data SB has FPU 512 Byte I-Cache Not covered in this report. Performance Brief 5

The basis of the comparison is a set of synthetic benchmark programs executed on all i960 microprocessors. Why synthetic benchmarks? The best indicator of performance is obtained using a customer s own benchmark program or, in the absence of that, resembling the actual application. Developing a benchmark, however, takes time and money and may not be feasible for all customers. Also, a customer who is considering microprocessor performance may not have completed development of an application program. Due to these constraints, developers may use synthetic benchmarks as one indicator of microprocessor performance. The results of synthetic benchmark performance should not be the sole factor in selecting a microprocessor. Benchmark data in this report is presented using simple bar charts. Figure 2 provides an example that shows the relative performance of various i960 microprocessors. Figure 2. Relative Performance of i960 Microprocessors (not frequency normalized) RELATIVE PERFORMANCE (DHRYSTONE ver 2.1) 4.5 4.0 3.5 3.0 2.7 2.8 3.4 3.5 4.3 Compiler Optimization- Speed (Profiled) 2.5 2.0 1.8 1.5 1.0 1.0 1.2 0.5 0.0 CA-33 CF-40 JA-33 JF-33 JD-66 JT-100 HA-40 HD-66 The programs used in this report were selected based on their general acceptance and well-known behavior, and because they are written in the C programming language. Each was chosen to provide an approximate, relative measurement of the performance of the microprocessor/compiler combination. The benchmark programs used include Dhrystone*, Dhrystone MIPS, Single Precision Whetstone*, and Double Precision Whetstone. 6 Performance Brief

2.0 i960 Microprocessor REF Product Highlights Intel s web-based Remote Evaluation Facility (REF) is located at the following URL: http://developer.intel.com/design/i960/testcntr/ i960 processor REF product highlights include: Private, password-protected accounts for storing your source files Easy-to-use Web interface Allows executing code on any of the following i960 processors: SA, SB, KA, KB, CA, CF, JA, JD, JF, JT, HA, HD 80960 processor speeds benchmarked in this report are as follows: HD-66-33 MHz external clock, 66 MHz core clock HA-40-40 MHz external clock, 40 MHz core clock JT-100-33 MHz external clock, 100 MHz core clock JD-66-33 MHz external clock, 66 MHz core clock JF-33-33 MHz external clock, 33 MHz core clock JA-33-33 MHz external clock, 33 MHz core clock CF-40-40 MHz external clock, 40 MHz core clock CA-33-33 MHz external clock, 33 MHz core clock Benchmark programs provided as example projects include: Dhrystone Single Precision Whetstone Double Precision Whetstone Source file compilation using the Intel CTOOLS Development Suite CTOOL compiler supports C and C++ programming language Easy-to-use timers for measuring the speed of your programs and subroutines or timing your critical code sections Three levels of optimization are available for compiling code (NONE, SPEED, and SIZE) Links to on-line manuals for CTOOLS and processors 3.0 Methodology There are several choices in memory technology when designing a microprocessor memory subsystem. DRAM is the predominant read/write memory technology. ROM, Flash, and EPROM are predominant technologies for read-only (or read-mostly) memory subsystems. SRAM subsystems may be expensive beyond practicality, as code and data size of applications continue to grow beyond 1 Mbyte. Performance Brief 7

The system designer must work within the bounds of these available memory technologies. For example, if a system designer must use 60 ns DRAM technology, the relevant question of performance is: How fast can the microprocessor execute using 60 ns DRAM? A 32-bit microprocessor memory subsystem typically resembles one of the three subsystems described in Table 1. Table 1. Typical 32-bit Microprocessor Memory Systems Cost Performance Code Size Description High High Small to Moderate Code and data are located in fast SRAM. SRAM provides access times on the order of 10-35 ns. Not a common design, but relatively simple to implement. Practical for applications that have little code and are not cost-sensitive. Moderate Moderate to High Moderate to Large Code and data are located in DRAM. The code and initialized data are loaded from a backplane bus or inexpensive ROM at initialization. Mainstream, inexpensive DRAM technology typically provides 60 to 70 ns access time and fast page mode access capability. The system designer can trade off interface cost and performance using different degrees of complexity such as burst mode support and interleaving. Performance may also be enhanced in these systems with a small, fast SRAM dedicated to frequently accessed data. Low Low to Moderate Large Code is executed from ROM. Data is located in DRAM. The system designer can increase performance by interleaving the ROM subsystem. Because this design is driven by low cost, a DRAM subsystem is typically implemented at the lowest possible cost. Microprocessor performance is influenced by several elements; these are categorized below as either intrinsic or extrinsic: Intrinsic elements generally are not or cannot be varied for performance analysis purposes. These properties are inherent to the microprocessor: Architecture and internal implementation (e.g., cache size, instruction set, registers) Efficiency of the external memory interface (e.g., instruction fetch bandwidth) Compiler efficiency (assuming that the best compiler is selected with the highest optimization enabled) Extrinsic elements can be varied depending on application requirements, performance, and cost constraints: Clock speed Bus bandwidth (memory wait states) Because many factors influence microprocessor performance, it is necessary to choose an equitable reference or baseline for a fair performance comparison. For this performance report, memory technology and clock speed are used as the common reference for measuring the performance of widely available industry benchmark programs. 8 Performance Brief

4.0 Benchmark Programs Table 2. Benchmark Program Descriptions Program Description Units of Measure Dhrystone Dhrystone MIPS Whetstone Tests integer performance. String manipulation is a common action in this program. Version 2.1 is used here. Measures instructions per second, based on Dhrystone v2.1 Measure of floating-point performance. Includes single and double precision floating-point performance. Thousands of Dhrystones per second Millions of instructions per second Millions of Whetstones per second 5.0 Development Workstation/Operating System Host: NCR S26 personal computer with a Pentium Pro processor at 200 MHz OS: Windows* NT 4.0 Service Pack 3 Memory: 130 Mbytes Hard disk size: 2 Gbytes Video: ATI Mach64*, 1 Mbyte memory Note: The host and operating system specified above do not influence the performance benchmarks contained in this report. The platform and operating system are used to execute the GNU compiler, download code to the 80960 evaluation boards, and store results of each benchmark run. 5.1 Compiler/Assembler/Linker The software used to compile the benchmark code for this report is the GNU960 v6.0.6 The compiler optimizations used for the web-based i960 Remote Evaluation Facility include the NONE, SIZE, and SPEED options. These options are used to define the neccesary compiler optimization switches, as follows: NONE: No compiler optimizations, the -O0 option SIZE: For non-profiling, the -O4 option SPEED: For profiling, the two pass compile option (provides program-wide optimization) Refer to the ic960 Compiler User s Guide (order number 651230) for further details on optimization options. Performance Brief 9

5.2 Target Hardware All performance numbers were obtained using a Cyclone Evaluation Platform (Cyclone EP) From Cyclone Microsystems with 8 Mbyte of 60 ns interleaved DRAM The Cyclone EP is a stand-alone, general purpose evaluation and development tool for Intel s family of i960 embedded processors. The main board provides the capability to install one of several i960 microprocessor modules. A design engineer can install different CPU modules to evaluate the various i960 microprocessors in one system environment. This type of environment is desirable when evaluating CPU performance. Further information on 80960 Cyclone Evaluation Platforms can be found at the following URLs: http://www.cyclone.com/ http://support.intel.com/support/processors/i960/evalboards/eval_faq.htm 10 Performance Brief

5.3 Memory Interface Configurations These DRAM access times are specific to the 80960 Cyclone Evaluation Platform. Table 3 was obtained from the Cyclone i960 Microprocessor Evaluation Platform User s Guide (order number 272577). Table 3. DRAM Access Times Frequency (MHz) Operation DRAM Speed (ns) Clock Cycles Wait States (x1,x2,x3,x4) Sustained Bandwidth 1 (Mbytes/sec) 16 Read 60, 70 3,1,1,1 1,0,0,0 36 20 Read 60, 70 3,1,1,1 1,0,0,0 45 25 Read 60 3,1,1,1 1,0,0,0 66 25 Read 70 4,1,1,1,1 2 2,0,0,0 50 33 Read 60, 70 4,1,1,1,1 2 2,0,0,0 66 40 Read 60 4,1,1,1,1 2 2,0,0,0 80 40 Read 70 5,2,2,2,1 2 3,1,1,1 53 16 Write 60, 70 3,2,2,2 1,1,1,1 25.6 20 Write 60, 70 3,2,2,2 1,1,1,1 32 25 Write 60 3,2,2,2 1,1,1,1 44.5 25 Write 70 4,2,2,2,1 2 2,1,1,1 36 33 Write 60, 70 4,2,2,2,1 2 2,1,1,1 48 40 Write 60 4,2,2,2,1 2 2,1,1,1 58 40 Write 70 4,2,2,2,1 2 2,1,1,1 53 NOTES: 1. Bandwidths stated are sustained bandwidths, not peak. 2. The extra cycle is the overhead of DRAM precharge. DRAM precharge time impacts back-to-back cycles only. Performance Brief 11

6.0 Relative Performance of the 80960 Microprocessor Family Note: These results are not frequency normalized. The following graphs show the relative performance of the i960 microprocessor family. 6.1 Dhrystone Version 2.1 Memory: Interleaved 60 ns DRAM Wait State Profile: 16/20/25 MHz: 10111 33/40 MHz: 20121 Unit of Measure: Thousand/Second (Larger is better) Figure 3. Dhrystone Version 2.1 Performance RELATIVE PERFORMANCE (DHRYSTONE ver 2.1) Thousand Dhrystones per Second 140 120 100 80 60 40 20 Compiler Optimization None (-00) Compiler Optimization Size (-04) Compiler Optimization Speed (Profiled) 0 CA-33 CF-40 JA-33 JF-33 JD-66 JT-100 HA-40 HD-66 12 Performance Brief

6.2 Dhrystone MIPS Note: Memory: Interleaved 60 ns DRAM Wait State Profile: 16/20/25 MHz: 10111 33/40 MHz: 20121 Unit of Measure: MIPS (Larger is better) MIPS performance numbers are extrapolated from Dhrystone v 2.1 performance numbers and supplied for indication only. Dhrystone MIPS = Dhystones per second (ver 2.1) divided by 1657. Figure 4. Dhrystone MIPS Relative Performance RELATIVE PERFORMANCE (DHRYSTONE MIPS) 90 80 70 Compiler Optimization None (-00) Compiler Optimization Size (-04) Compiler Optimization Speed (Profiled) 60 50 40 30 20 10 0 CA-33 CF-40 JA-33 JF-33 JD-66 JT-100 HA-40 HD-66 Performance Brief 13

6.3 Whetstone Single Precision Memory: Interleaved 60 ns DRAM Wait State Profile: 16/20/25 MHz: 10111 33/40 MHz: 20121 Unit of Measure: Million Whetstones per Second (Larger is better) Figure 5. Whetstone-Single Precision Relative Performance RELATIVE PERFORMANCE (Whetstone-Single precision) Million Whetstones per Second 14 12 10 8 6 4 2 Compiler Optimization None (-00) Compiler Optimization Size (-04) Compiler Optimization Speed (Profiled) 0 CA-33 CF-40 JA-33 JF-33 JD-66 JT-100 HA-40 HD-66 14 Performance Brief

6.4 Whetstone Double Precision Memory: Interleaved 60 ns DRAM Wait State Profile: 16/20/25 MHz: 10111 33/40 MHz: 20121 Unit of Measure: Million Whetstones per Second (Larger is better) Figure 6. Whetstone Double Precision Relative Performance RELATIVE PERFORMANCE (Whetstone-Double precision) Million Whetstones per Second 10 9 8 7 6 5 4 3 2 Compiler Optimization None (-00) Compiler Optimization Size (-04) Compiler Optimization Speed (Profiled) 1 0 CA-33 CF-40 JA-33 JF-33 JD-66 JT-100 HA-40 HD-66 Performance Brief 15