Benchmarking AMD64 and EMT64

Similar documents
ISTITUTO NAZIONALE DI FISICA NUCLEARE

Developing a Powerful yet Inexpensive Computational Infrastructure for the UT Dept. of Nuclear Engineering. David D. Dixon April 8, 2009

Microsoft. iron Krokhmal et IT /2005

MINIMUM HARDWARE AND OS SPECIFICATIONS File Stream Document Management Software - System Requirements for V4.2

Initial Explorations of ARM Processors for Scientific Computing

Minimum Hardware and OS Specifications

DEDICATED SERVERS WITH EBS

Building 96-processor Opteron Cluster at Florida International University (FIU) January 5-10, 2004

Microsoft SQL Server 2012

DEDICATED SERVERS WITH WEB HOSTING PRICED RIGHT

Efficient Power Management

What Transitioning from 32-bit to 64-bit x86 Computing Means Today

CPU Performance/Power Measurements at the Grid Computing Centre Karlsruhe

On-Demand Supercomputing Multiplies the Possibilities

CurrentWare SQL Server Configuration Guide

Short Note. Cluster building and running at SEP. Robert G. Clapp and Paul Sava 1 INTRODUCTION

The AMD64 Technology for Server and Workstation. Dr. Ulrich Knechtel Enterprise Program Manager EMEA

Virtualization. Michael Tsai 2018/4/16

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

White Paper. File System Throughput Performance on RedHawk Linux

Exchange Server 2007 Performance Comparison of the Dell PowerEdge 2950 and HP Proliant DL385 G2 Servers

Use of containerisation as an alternative to full virtualisation in grid environments.

Properly Sizing Processing and Memory for your AWMS Server

High Volume Transaction Processing in Enterprise Applications

Vmware VCP-101V. Infrastructure with ESX Server and VirtualCenter. Download Full Version :

Geant4 Computing Performance Benchmarking and Monitoring

GTRC Hosting Infrastructure Reports

Ivane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU

PAC094 Performance Tips for New Features in Workstation 5. Anne Holler Irfan Ahmad Aravind Pavuluri

IBM Emulex 16Gb Fibre Channel HBA Evaluation

The Use of Cloud Computing Resources in an HPC Environment

Performance of Multicore LUP Decomposition

Meet the Increased Demands on Your Infrastructure with Dell and Intel. ServerWatchTM Executive Brief

Virtualizing a Batch. University Grid Center

What is Remote PHY? Virtualization of the Core

Operating Systems. Written by Justin Browning. Linux / UNIX Distributions Report

Profiling Grid Data Transfer Protocols and Servers. George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA

Virtual machine provisioning, code management, and data movement design for the Fermilab HEPCloud Facility

Managing Hardware Power Saving Modes for High Performance Computing

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

COL862 Programming Assignment-1

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

Energy Management of MapReduce Clusters. Jan Pohland

Lecture 3: Evaluating Computer Architectures. How to design something:

Performance comparison between a massive SMP machine and clusters

Ekran System System Requirements and Performance Numbers

A Case Study in Optimizing GNU Radio s ATSC Flowgraph

Forensic Toolkit System Specifications Guide

Improving Performance and Power of Multi-Core Processors with Wonderware and System Platform 3.0

Servicing HEP experiments with a complete set of ready integreated and configured common software components

David R. Mackay, Ph.D. Libraries play an important role in threading software to run faster on Intel multi-core platforms.

Performance Comparisons of Dell PowerEdge Servers with SQL Server 2000 Service Pack 4 Enterprise Product Group (EPG)

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

CSC501 Operating Systems Principles. OS Structure

HP-DAEMON: High Performance Distributed Adaptive Energy-efficient Matrix-multiplicatiON

Affordable and power efficient computing for high energy physics: CPU and FFT benchmarks of ARM processors

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

XEN and KVM in INFN production systems and a comparison between them. Riccardo Veraldi Andrea Chierici INFN - CNAF HEPiX Spring 2009

2 to 4 Intel Xeon Processor E v3 Family CPUs. Up to 12 SFF Disk Drives for Appliance Model. Up to 6 TB of Main Memory (with GB LRDIMMs)

STAR Watch Statewide Technology Assistance Resources Project A publication of the Western New York Law Center,Inc.

The Effect of NUMA Tunings on CPU Performance

A Case for High Performance Computing with Virtual Machines

Performance Evaluation Using Network File System (NFS) v3 Protocol. Hitachi Data Systems

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

Tracing and Visualization of Energy Related Metrics

Scaling PostgreSQL on SMP Architectures

Hardware Sizing Guide OV

A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b

Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2

ENERGY-EFFICIENT VISUALIZATION PIPELINES A CASE STUDY IN CLIMATE SIMULATION

Evaluation Report: HP StoreFabric SN1000E 16Gb Fibre Channel HBA

H.J. Lu, Sunil K Pandey. Intel. November, 2018

Robert Jamieson. Robs Techie PP Everything in this presentation is at your own risk!

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

Geant4 MT Performance. Soon Yung Jun (Fermilab) 21 st Geant4 Collaboration Meeting, Ferrara, Italy Sept , 2016

Consolidating OLTP Workloads on Dell PowerEdge R th generation Servers

CADWorx DraftPro Questions and Answers

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

Technical Brief: Specifying a PC for Mascot

REMEM: REmote MEMory as Checkpointing Storage

Looking ahead with IBM i. 10+ year roadmap

Amazon Web Services: Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud

Evaluation of Intel Memory Drive Technology Performance for Scientific Applications

Benchmark Performance Results for Pervasive PSQL v11. A Pervasive PSQL White Paper September 2010

Windows 64-bit FAQ & Support

A Comprehensive Study on the Performance of Implicit LS-DYNA

Scientific Computing at SLAC

Moving from 32 to 64 bits while maintaining compatibility. Orlando Ricardo Nunes Rocha

Performance of Variant Memory Configurations for Cray XT Systems

Cloud Computing Capacity Planning

The SHARED hosting plan is designed to meet the advanced hosting needs of businesses who are not yet ready to move on to a server solution.

Fermi National Accelerator Laboratory

On the Use of Large Intel Xeon Phi Clusters for GEANT4-Based Simulations

AMD Opteron Processors In the Cloud

Embedded Hardware and OS Technology Empower PC-Based Platforms

Benchmark of a Cubieboard cluster

Chapter 05. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

HTCondor on Titan. Wisconsin IceCube Particle Astrophysics Center. Vladimir Brik. HTCondor Week May 2018

CAPP 2005 Zeuthen. 64-bit Linux. Stephan Wiesand DESY -DV -

Benchmark Study: A Performance Comparison Between RHEL 5 and RHEL 6 on System z

Transcription:

Benchmarking AMD64 and EMT64 Hans Wenzel, Oliver Gutsche, FNAL, Batavia, IL 60510, USA Mako Furukawa, University of Nebraska, Lincoln, USA Abstract We have benchmarked various single and dual core 64 Bit CPU s from AMD (AMD64) and Intel (EMT64) using various HEP and CMS specific applications. This results are compared with results obtained with 32 Bit (IA32) CPU s. INTRODUCTION There are now two 64-bit implementations in the Intel compatible processor marketplace: the Intel EM64T and AMD AMD64. For information about the architecture of the Opteron chip see [1],[2],[3]. In addition, dual-core CPU s are now available. They promise to nearly double the computing power while not increasing the infrastructure cost especially for cooling and electricity significantly compared to boxes equipped with single-core chips [4]. We should also benefit from: lower space requirements and need for fewer racks, fewer console connections and fewer network connections. Although the later might adversely affect the performance when one network interface is shared by 4 instead of 2 CPU-cores. We were especially interested in how the performance of the multicore processors scales when each CPU core is running a copy of the application simultaneously and how much power these chips consume. Table lists the various processor types that we evaluated. OPERATION MODES There are three distinct operation modes available for the AMD64 and EM64T architectures: 32-bit legacy mode (32Bit/32Bit) In this mode, both AMD64 and EM64T processors will act just like any other IA32 compatible processor. You can install your 32-bit OS on such a system and run 32-bit applications, however, you will not be able to make use of the new features such as the flat memory addressing above 4 GB or the additional General Purpose Registers (GPR). Compatibility mode (64Bit/32Bit) This is an intermediate mode of the full 64-bit mode described below. Compatibility mode gives you the ability to run a 64-bit operating system while still being able to run unmodified 32-bit applications. Each wenzel@fnal.gov Processor Speed Architecture (cores) Opteron 244 1.8 Ghz AMD-64 ( 1) Opteron 265 1.8 Ghz AMD-64 ( 2) Opteron 246 2.0 Ghz AMD-64 ( 1) Opteron 246HE 2.0 Ghz AMD-64 ( 1) Opteron 270 2.0 Ghz AMD-64 ( 2) Athlon 64 3500+ 2.2 Ghz AMD-64 ( 1) Athlon 64 4200+ 2.2 GHz AMD-64 ( 2) Opteron 248 2.2 Ghz AMD-64 ( 1) Opteron 275 2.2 Ghz AMD-64 ( 2) Opteron 250 2.4 Ghz AMD-64 ( 1) Xeon 3.4 Ghz EM64T ( 1) Xeon 3.6 Ghz EM64T ( 1) Xeon 2.4 Ghz IA32 ( 1) Xeon 2.8 Ghz IA32 ( 1) Table 1: Processors tested. We used single CPU mainboards for the Athlon 64 CPU s, all the other processors were tested with dual CPU mainboards. The 246HE is a low power version of the Opteron performs just as well as the normal one with the same speed. 32-bit application will still be limited to a maximum of 4 GB of physical memory. However, the 4 GB limit is now imposed on a per-process level, not at a systemwide level. Full 64-bit mode (64Bit/64Bit) This is the full 64 Bit mode which means that a 64-bit operating system and 64-bit applications are used. In this mode, an application can have a virtual address space of up to 40-bits (1 TB of addressable memory). Applications that run in full 64-bit mode have access to the full physical memory range, and also have access to the new and expanded GPRs. INSTALLATION AND ADMINISTRATION 64 Bit Linux distributions have been available for quite a while now. In principle the new platform behaves like any PC, installation and booting the system just works as we are used to. We also successfully used the Rocks Cluster Distribution [5] to install the OS. As operating system we used Scientific Linux 3.04 which is available in 32 Bit and 64 Bit and comes with the 2.4.21 kernel. With the new platform system management becomes a little bit more complex. When running with a 64 Bit OS in addition to the 64 Bit libraries also the 32 Bit Environment

must be provided and maintained to allow applications to run in compatibility mode. Some of the tools (e.g. APT) are not multi platform aware and for other tools (YUM, RPM) the support for the new environment could be improved. BENCHMARK APPLICATIONS We wanted to run benchmarks related to HEP and CMS physics. We used the analysis tool ROOT [6], the MonteCarlo generator Pythia [7], the CMS detector simulation program OSCAR and the CMS event reconstruction programs ORCA citecmsoo. The CMS applications OSCAR amd ORCA could only be benchmarked in (32Bit/32Bit) and (64Bit/32Bit) mode since no 64 Bit version exist. The CMS software was installed using DAR [9]. In [10] one can find results Benchmarking results for a different set of HEP Astroparticle Physics applications. Root and PYTHIA were compiled as 32 Bit and 64 Bit application. For the compilation we used the g++ (ROOT) and g77 (Pythia) compilers based on gcc version 3.2.3 that is provided with Scientific Linux 3.04. The CMS applications and ROOT are dynamically linked C++ programs while PYTHIA is a statically linked FORTRAN program. ROOT: We used the 4.02/00 production version. We used the stress benchmark which is a suite of programs that tests the essential parts of Root. Pythia: For the benchmark we generate 100.000 supersymmetry events at s =14 TeV (the Pythia main65.f example). The code is compiled with g77 with the O2 option: g77 -O2 - o main65 main65.f pythia6227.o. OSCAR 3 7 0: The GEANT 4 based CMS detector simulation program. Here we simulate 300 single pion events of 50 GeV/c P t. ORCA 8 7 1: The OSCAR output events are then digitized and reconstructed. Table shows the matrix of the different applications and operation modes that we used to benchmark the performance. Application (up to 4 copies) 32-Bit leg. Compatib. 64-Bit (32/32) (64/32) (64/64) ROOT: stress Yes Yes Yes Pythia Yes Yes Yes OSCAR Yes Yes N/A ORCA Digi. Yes Yes N/A ORCA Reco. Yes Yes N/A Table 2: Matrix of various benchmarks run in the different operating modes. BENCHMARKING RESULTS In this section we present the various benchmark results. The Opteron 250 that we tested in (64/64) Bit mode runs ROOT stress/pythia about 2.4/2.2 times faster than the 2.4 GHz Xeons in (32/32). The Opteron 275 which allows to run 2 applications simultaneous per CPU runs ROOT stress/pythia about 4.4/4.1 times faster than the 2.4 GHz Xeons in legacy mode. The Pythia results are summarized in Table. For the dual core CPUs we see basically no drop in performance when running up to 4 copies of Pythia simultaneously (1 per core). We observe a 20 % boost in performance when running in (64/64) bit mode compared to (64Bit/32Bit) compatibility mode for both AMD64 and EMT64. There is a small drop in performance ( 1%) for the Opterons when running in compatibility mode compared to legacy mode. Table summarizes the Root Stress Benchmarking results. We observe that this application also scales very well when running up to 4 copies (1 per core) on the dual core machines. There is only a 1.6% performance drop when running 4 copies of stress simultaneously on the dual-cores. We observe that this application runs about 6% slower in compatibility mode compared to legacy mode. Also in this case the Opteron runs more than 40 % faster in 64Bit mode compared to (32/32) compared to the 64 Bit Xeons which only gain approximately 20%. Table summarizes the OSCAR Simulation Benchmarking results. We observe that OSCAR scales very well resulting in a performance drop of only 0.9%/2.2%/1.6% on the 265/270/275 running up to 4 copies (1 per core) of OS- CAR simultaneously. OSCAR is a very CPU and memory intesive application but it doesn t require a lot of io Finally Table summarizes the CMS ORCA Benchmarking results. As the Applications become more IO intense we observe that the efficiency drop increases as the processes compete for disk IO bandwidth. For the ORCA digitization the drop is 0.6 %/3.3 %/2.8 % on the 265/270/275 running 4 copies simultaneously. For the ORCA reconstruction application we observe a significant drop in performance of 1.4 %/8.8 %/21.8 % for the 265/270/275 respectively. CMS software runs in 64 Bit compatibility mode and 32 Bit native mode without any modifications to the code. We observe OSCAR and ROOT applications to run around 5% slower in compatibility mode than in legacy mode. The effect is even more pronounced for the digitization step where the loss in performance is around 15 %. When running in 64 Bit native mode we observe speed increases of 21% (Pythia) and 40% (ROOT) compared to 32 Bit native mode. The CMS applications OSCAR and ORCA will be replaced by a new framework which will be ported to the new 64 Bit platform. POWER CONSUMPTION Nowadays the amount of computing power that can be provided is very often limited by the amount of cooling and

Pythia Benchmarking results CPU legacy (32/32)Bit Compatibility (64/32)Bit 64-bit (64/64)Bit (Evts/sec) per core for (Evts/sec) per core for (Evts/sec) per core for 1/ 2 processes 1/ 2/ 3/ 4 processes 1/ 2/ 3/ 4 processes Opteron 244 91.0/91.0 90.3/90.4/-/- 110.1/110/-/- Opteron 265 -/- 91.98/92.11/92.1/92.03 110.9/110.91/110.8/110.38 Opteron 246 101.3/101.4 100.9/101.1/-/- 121.2/121.5/-/- Opteron 270 -/- 102.36/102.44/102.35/102.31 122.95/122.95/123.02/123.07 Athlon 64 3500+ -/- 111.06/-/-/- 129.8/-/-/- Athlon 64 2 4200+ -/- 106.7/102.9/-/- 128.6/130/-/- Opteron 248 113.0/112.9 112.3/112.3/-/- 136.4/136.1/-/- Opteron 275 -/- 112.67/112.75/112.6/112.5 135.3/135.3/135.2/134.9 Opteron 250 121.3/121.3 120.8/120.7/-/- 146.9/146.8/-/- XEON 3.4 88.5/88.7 88.1/88.3/-/- 108.6/108.6/-/- XEON 3.6 93.5/94.21 93.4/93.4/-/- 115.6/115.4/-/- XEON 2.4 65.5/65.5 na na XEON 2.8 75.9/75.9 na na Table 3: Summary of Pythia Benchmarking results. We see basically no drop in performance when running up to 4 copies of Pythia simultaneously. There is a 20 % boost in performance when running in (64/64) bit mode compared to (64Bit/32Bit) compatibility mode. There is a small drop in performance ( 1%) for the Opterons when running in compatibility mode compared to legacy mode. electrical power that the computing center can provide. In addition the expenses for cooling and electricity contribute significantly to the operating costs of a computing center. Therefore the power consumption is one of the main criteria when selecting worker nodes for computer farms. Power Consumption CPU Idle Loaded 242 1.2A 1.4A 248 1.4A 1.6A 270 1.2A 1.7A Table 7: Summary of power consumption based on idle load running SL 3.04 (64 Bit) and fully loaded CPUs using a calculation of π and running dd. As shown in Table, the dual core CPUs use about the same amount of power as the single core CPUs. This shows that we in fact get twice the performance using dual-core CPUs for the same cost in power and cooling. The low power Opteron 246HE was measured to use 7% less power when idle and 11% less under load compared to the regular 246. We found the single core Intel processors that we tested significantly less energy efficient than the AMD Opterons. This makes the dual core Opteron chips very attractive for running a cluster of high-performance machines. CONCLUSION It has been shown that for the considered applications the available 64-bit commodity computers from AMD and Intel are a viable alternative to comparable 32-Bit systems. Todays 32 Bit applications like the CMS software tested here can be executed on 64 Bit machines in either 32 bit mode or in 64 Bit compatibility without any modification of the code. This makes the transition to the new platform painless. A significant gain in performance might be achieved porting to the new processor architecture platform in 64 Bit mode. We saw a significant increase in performance (Pythia: 21 % ROOT: 40%) when applications were compiled for 64 Bit. Based on the investigations reported here we chose AMD Opterons for our latest farms purchase and run a 64 Bit OS. We found that Dual-core processors scale well as we run a copy of the application on each available core. The power efficiency makes the AMD dual-core processors a very attractive choice when running farms of high-performance machines. REFERENCES [1] CMS IN 2005/012, Hans Wenzel, Benchmarking AMD Opteron (AMD64) and Intel (EM64T) systems. [2] What s So Great for Developers About the AMD64? David Kaplowitz, June 23, 2003 http://www.devx.com/amd/article/16018. [3] 64-bit Computing with Intel EM64T and AMD AMD64 http://www.redbooks.ibm.com/abstracts/tips0475.html.

ROOT Stress CPU legacy (32/32)Bit Compatibility (64/32)Bit 64-bit (64/64)Bit ROOTMks per core for ROOTMks per core for ROOTMks per core for 1/ 2 processes 1/ 2/ 3/ 4 processes 1/ 2/ 3/ 4 processes Opteron 244 723.6/715.35 672.5/673.3/-/- 1059.9/1050.75/-/- Opteron 265-719.3/718.7/719.6/715.65 1093.3/1085.35/1077.1/1075.9 Opteron 246 797.1/793.45 751.6/748.25/-/- 1176.1/1165.65/-/- Opteron 270-827.3/819.3/819.77/819.33 1201.6/1201.75/1190.9/1182.8 Athlon 64 3500+ - 891.4/-/-/- 1295.9/-/-/- Athlon 64 2 4200+ - 738.6/788/-/- 1120/1142/-/- Opteron 248 894.2/888.4 834.7/835.75/-/- 1320.8/1305.95/-/- Opteron 275-910.2/907.1/904.2/903 1327.96/1335.1/1311.1/1306 Opteron 250 957.6/956.7 898.3/891.35/-/- 1402.6/1395.55/-/- XEON 3.4 964.6/950.05 909.3/903.8/-/- 1145.9/1135.8/-/- XEON 3.6 1034.8/1020.6 969.7/958.7/-/- 1208.6/1194.4/-/- XEON 2.4 590.7/583.75 na na XEON 2.8 731/720.05 na na Table 4: Summary of Root Stress Benchmarking results. We observe that this application scales very well. There is only a 1.6% performance drop when running one copy of the benchmark per core simultaneously. We observe that the application runs about 6% slower in compatibility mode compared to (32/32) mode. The Opterons run more than 40 % faster in 64Bit mode compared to (32/32) mode while the 64 Bit XEONS only gain about 20%. [4] CMS IN 2005/030, Hans Wenzel, Spencer Langley, Mako Furukawa, Benchmarking dual-core AMD Opteron (AMD64) systems. [5] http://www.rocksclusters.org. [6] http://root.cern.ch. [7] T. Sjöstrand, P. Edén, C. Friberg, L. Lönnblad, G. Miu, S. Mrenna and E. Norrbin, Computer Phys. Commun. 135 (2001) 238 (LU TP 00-30, hep-ph/0010017). [8] http://cmsdoc.cern.ch/cmsoo/cmsoo.html. [9] DAR (Distribution After Release) was developed by Natalia Ratnikova. DAR provides a complete runtime environment for various CMS applications but doesn t allow for recompilation. http://home.fnal.gov/ natasha/dar/dar.html. [10] Proceedings of CHEP 2004 Interlaken, Switzerland, P. Wegner, S.Wiesand, 64-Bit Opteron systems in High Energy and Astroparticle Physics.

OSCAR Simulation CPU Operating mode legacy (32/32)Bit Compatibility (64/32)Bit (Evts/sec) per core for (Evts/sec) per core for 1/ 2 processes 1/ 2/ 3/ 4 processes Opteron 244 0.0915/0.0923 0.0866/0.0867/-/- Opteron 265-0.0920/0.0918/0.0916/0.0912 Opteron 246 0.1021/0.1028 0.0956/0.0959/-/- Opteron 270-0.1016/0.1012/0.0999/0.0994 Athlon 64 3500+ - 0.1076/-/-/- Athlon 64 2 4200+ - 0.096/0.098/-/- Opteron 248 0.1147/0.1139 0.1067/0.1074/-/- Opteron 275-0.1089/0.1088/0.1081/0.1073 Opteron 250 0.1186/0.1221 0.1154/0.1156/-/- XEON 3.4 0.1067/0.1060 0.0938/0.0929/-/- XEON 3.6 0.1066/0.105 0.1012/0.1012/-/- XEON 2.4 0.0603/0.0604 na XEON 2.8 0.0715/0.0702 na Table 5: Summary of OSCAR Benchmarking results. Like Root stress results we observe that the application runs about 5 % slower in compatibility mode compared to legacy mode. We observe that OSCAR scales very well resulting in a performance drop of only 0.9%/2.2%/1.6% on the 265/270/275 running up to 4 copies (1 per core) of OSCAR simultaneously. OSCAR is a very CPU and memory intesive application but it doesn t require a lot of io. CPU legacy (32/32)Bit Compatibility (64/32)Bit Digitization Digitization Reconstruction (Evts/sec) for (Evts/sec) per core for (Evts/sec) per core for 1 processes 1/ 2/ 3/ 4 processes 1/ 2/ 3/ 4 processes Opteron 244 0.1171 0.0989/0.098/-/- 0.809/0.806/-/- Opteron 265-0.1096/0.108/0.1092/0.1089 0.9263/0.9268/0.9058/0.9131 Opteron 246 0.1293 0.1092/0.108/-/- 0.901/0.895/-/- Opteron 270-0.1311/0.1290/0.1267/0.1268 1.055/1.057/0.974/0.962 Athlon 64 3500+ - 0.1421/-/-/- 1.118/-/-/- Athlon 64 2 4200+ - - - Opteron 248 0.1430 0.1210/0.1196/-/- 0.974/0.988/-/- Opteron 275-0.142/0.140/0.140/0.138 1.106/1.089/0.946/0.864 Opteron 250 0.1562 0.1300/0.1296/-/- 1.039/1.035/-/- XEON 3.4 0.1403 0.1251/0.1238/-/- 0.929/0.890/-/- XEON 3.6 0.1427 0.1332/0.1310/-/- 1.026/1.019/-/- XEON 2.4 0.0868 na na XEON 2.8 0.0996 na na Table 6: Summary of CMS ORCA Benchmarking results. As the Applications become more IO intense we observe that the efficiency drop increases as the processes compete for disk IO bandwidth. For the ORCA digitization the drop is 0.6 %/3.3 %/2.8 % on the 265/270/275 running 4 copies simultaneously. For the ORCA reconstruction application we observe a significant drop in performance of 1.4 %/8.8 %/21.8 % for the 265/270/275 respectively.