Opus IB Grid Enabled Opteron Cluster with InfiniBand Interconnect

Size: px
Start display at page:

Download "Opus IB Grid Enabled Opteron Cluster with InfiniBand Interconnect"

Transcription

1 Opus IB Grid Enabled Opteron Cluster with InfiniBand Interconnect Olaf Schneider, Frank Schmitz, Ivan Kondov, and Thomas Brandel Forschungszentrum Karlsruhe, Institut für Wissenschaftliches Rechnen, Herrmann-von-Helmholtz-Platz 1, D Eggenstein-Leopoldshafen, Germany {thomas.brandel,ivan.kondov,frank.schmitz, Abstract. Opus IB is an Opteron based cluster system with InfiniBand interconnect. Grid middleware provide the integration into CampusGrid and D-Grid projects. Mentionable details of hardware and software equipment as well as configuration of the cluster will be introduced. Performance measurements show that InfiniBand is not only well suited for message-passing based parallel applications but also competitive as transport layer for data access in shared cluster file systems or high throughput computing. Keywords: Cluster, Grid, Middleware, File system, SAN, InfiniBand. 1 Introduction Cluster systems with fast interconnects like Myrinet, Quadrics or InfiniBand become more and more important in the realm of high performance computing (HPC). The Institute for Scientific Computing at the Forschungszentrum Karlsruhe was active in adopting and testing InfiniBand technology very early. We started with a small test system in 2002, followed by a Xeon based system with 13 nodes called IWarp in The next generation of InfiniBand cluster is Opus IB, which we describe in this paper. In the following section we briefly look at the project CampusGrid in which most activities reported here are embedded. The key facts about Opus IB s hardware and software are collected in several subsections of Sect. 3. Thereafter, in Sect. 4, we comment on some measurements which prove the achievable performance with InfiniBand. We shall conclude the paper with a short survey on Opus IB as part of the D-Grid infrastructure. 2 The CampusGrid Project The R&D project CampusGrid [1,2] was initiated at the Forschungszentrum 2004 with the aim to construct and build a heterogeneous network of resources B. Kågström et al. (Eds.): PARA 2006, LNCS 4699, pp , c Springer-Verlag Berlin Heidelberg 2007

2 Opus IB Grid Enabled Opteron Cluster with InfiniBand Interconnect 841 for computing, data and storage. Additionally, the project gives users the opportunity to run their applications in such a heterogeneous environment. Grid technologies were selected as a state-of-the-art method to achieve these goals. The use of standard Grid middleware in our local infrastructure is advantageous, because we enable the scientists of the Forschungszentrum to smoothly enter the global Grid. The project started with a testbed for evaluation of middleware and other components. While the initial testbed was small, it already comprised all kinds of resources in our heterogeneous IT environment: clusters, SMP servers, and vector processors as well as managed storage (SAN). During project progress more and more production systems shall be integrated in the CampusGrid environment. In order to do so we need a clear and smooth migration path from our classical HPC environment into the new Grid-based infrastructure. Thus, in the design of the CampusGrid architecture we need to take care of many boundary conditions we can not (easily) change in our project, e. g. central user administration via Active Directory Services (ADS). The cluster Opus IB started as part of the CampusGrid testbed and it is now growing into a fully productive system. 3 Hardware and Software of Opus IB 3.1 Overview ThenameOpus IB is an abbreviation for Opteron cluster with InfiniBand. As the name implies, the cluster is assembled of dual processor nodes with Opteron 248 processors and the high-performance networking fabric is an InfiniBand switch (InfinIO9000 by SilverStorm). All worker nodes and most cluster nodes run CERN Scientific Linux as operating system (64bit version). At the time of writing this, there are 64 worker nodes with 128 CPUs in total and an aggregated memory of about 350 GB. All worker nodes and the switch fabric are build into water cooled cabinets by Knürr. This technology was originally developed for the GridKa [3] cluster. 3.2 InfiniBand InfiniBand (IB) is a general purpose network and protocol usable for different higher level protocols (TCP/IP, FibreChannel/SCSI, MPI, RFIO/IB) [4]. In contrast to existing interconnect devices that employ a shared-bus I/O architecture, InfiniBand is channel-based, i. e., there is a dedicated path from one communication partner to the other. Links can be aggregated, which is standardized for 4 and 12 links called 4X and 12X. We use 4X in our installation, that means 1 GB/s usable bandwidth (in each direction). FibreChannel (FC) bridges plugged into the IB switch enable us to directly connect storage devices in the Storage Area Network (SAN) to the cluster nodes. Thus it is not necessary to equip each node with a FC host bus adapter.

3 842 O. Schneider et al. As an off-the-shelf high-speed interconnect InfiniBand is a direct competitor of technologies like Myrinet and Quadrics. Our decision to use InfiniBand in the cluster was mainly due to the positive experiences in recent projects (cf. [5]). 3.3 Running Two Batch Schedulers Concurrently A peculiarity of the Opus IB cluster is that all worker nodes are managed by two different job schedulers concurrently. At one hand, we have the OpenPBS successor TORQUE [6] together with the MAUI scheduler [7]. On the other hand, there is a mixed LoadLeveler cluster which consists of the Opus IB nodes, several PowerPC blades and some other single machines. Recently we added our AIX production system (pseries 655 and 630) to this mixed cluster. The reasons for running these two batch systems concurrently are numerous. First the history: As we started assembling the cluster we chose TORQUE for a couple of reasons it is Open Source, compatible to OpenPBS, but more stable. At the point when IBM provided first LoadLeveler with the mixed cluster option, we decided to try it because of curiosity. Shortly before we had got some PowerPC blades which could serve as AIX nodes in our testing environment. The Linux part of the testbed was just Opus IB. At that point, the cluster was still in a quite experimental mode of operation. Thus, two batch systems did not cause any problems but were sometimes useful for tests. This configuration survived the gradual change of the cluster into productive operation. Currently, LoadLeveler works very well and is used for the majority of user jobs submitted in the classical way on the command line. On the other hand, Grid middleware supports more often TORQUE/PBS than LoadLeveler. Moreover, the combination with MAUI scheduler is quite popular in the Grid community. Thus, TORQUE serves as a kind of reference system when using Grid Middleware. A third reason is, that we want to stay independent of commercial software vendors as far as possible. That means, an Open Source solution should be available at least as fall-back. Running two job managers concurrently without noting of each other of course holds the danger to overload nodes with too many jobs. In practice, however, we noticed that such problems occur less often than expected. The reason was probably that both schedulers take the actual workload on a node into account, when making the scheduling decision. For Maui scheduler this behavior is triggered by setting the configuration parameter MAXLOAD. Hence, Maui marks a node busy if the load exceeds MAXLOAD. The exact value needs some tuning we used values between 1.1 and 2.5. LoadLeveler prefers the node with the lowest load by default. If overcommitment occurs, it is always very harmful, especially if it affects the workload balance of a parallel job (since a single task is slowed down compared to all other tasks). Recently we tried to solve this kind of problems by adding prolog and epilog scripts to each job. After submission a job waits in a queue until a matching resource is available. Right before job startup the scheduler, say LoadLeveler, runs a prolog script, to which the list of processors (nodes) occupied by the job is passed (via the variable LOADL PROCESSOR LIST). The prolog script utilize this information to decrease the number of job slots

4 Opus IB Grid Enabled Opteron Cluster with InfiniBand Interconnect 843 in the list of available resources at the other scheduler (i. e. TORQUE). After the job has finished, the slot number is increased in the epilog script.thus,we reconfigure dynamically the resources of the second scheduler if a job is started by the fist scheduler, and vice versa. 3.4 Cluster Management Using Quattor Automated installation and management of the cluster nodes is one of the key requirements for operating a cluster economically. At Opus IB this is done using Quattor, a software developed at CERN [8]. Some features of Quattor are: automated installation and configuration software repository (access via http) Configuration Data Base (CDB) server template language to describe setup Node Configuration Manager (NCM) using information from CDB templates to describe software setup and configuration The open standards, on which the Quattor components are based, allow easy customization and addition of new functionality. For instance, creating new node configuration components is essentially writing a Perl module. In addition, the hierarchical CDB structure provides a good overview of cluster and node properties. Addition of new hardware or changing installed software on existing hardware is facilitated tremendously by Quattor the process takes as long as several minutes. 3.5 Kerberos Authentication and Active Directory For the CampusGrid project it was decided to use Kerberos 5 authentication with the Active Directory Server as Key Distribution Center (KDC). Thus all Opus IB nodes are equipped with Kerberos clients and a Kerberos enabled version of OpenSSH. As a work-around for the missing Kerberos support in the job scheduling systems (PBS, LoadLeveler) we use our own modified version of PSR [9], which incorporates Kerberos 5 support. While Kerberos is responsible for authentication, the identity information stored in the passwd file still needs to be transferred to each node. For this purpose we use a newly developed Quattor component, which retrieves the necessary data via LDAP from the Active Directory and then distributes it to the cluster by the usual Quattor update mechanism. 3.6 StorNext File System SNFS is a commercial product by ADIC [10]. It has several features which support our goal to provide seamless access to a heterogeneous collection of HPC and other resources:

5 844 O. Schneider et al. Native clients are available for many operating systems (Windows, AIX, Linux, Solaris, IRIX). The metadata server does not require proprietary hardware. Active Directory integration is part of the current version. We get very good performance results in our evaluation (cf. Sect. 4). Installation procedure and management is simpler than in competing products. A drawback is that file system volumes can not be enlarged during normal operation without a maintenance period. 3.7 Globus Toolkit 4 In the project CampusGrid we decided to use Globus Toolkit 4 (GT4) as basic middleware. For an overview of features and concepts of GT4 we refer to Forster [11] and the documentation of the software [12]. The current configuration for the Opus IB cluster is depicted in Fig. 1. We use the usual Grid Security Infrastructure (GSI), the only extension is a component to update the grid mapfile with data from the Active Directory. Fig. 1. WS-GRAM for job submission on Opus IB, with identity management using Active Directory Server (ADS).

6 Opus IB Grid Enabled Opteron Cluster with InfiniBand Interconnect 845 For LoadLeveler jobs there is an additional WS-GRAM adapter and a scheduler event generator (SEG). Actually there are two of them one running on the GT4 server mentioned above and a second running on a PowerPC machine with AIX. The latter was installed to test GT4 with AIX. The cluster monitoring data gathered by Ganglia [13] are published in MDS4, the GT4 monitoring and resource discovery system (using the Ganglia Information Provider shipped with Globus). 4 Benchmarks 4.1 Data Throughput Right from the start our objective in evaluating InfiniBand technology was not only the fast inter-node connection for message-passing based parallel applications. We also focused on the data throughput in the whole cluster. The large data rates achievable with InfiniBand are appropriate for accessing large amounts of data from each node concurrently via file system or other protocols like RFIO. Preliminary results of our studies can be found in [5]. Later on we successfully tested the access to storage devices via an InfiniBand-FibreChannel bridge together with a Cisco MDS9506 SAN director. These tests were part of a comprehensive evaluation process with the aim to find a SAN based shared file system solution with native clients for different architectures. Such a heterogeneous HPC file system is intended as a core component of the CampusGrid infrastructure. A comparison of StorNextFS (cf. Sect.3.6) with two competitors (SAN-FS by IBM and CXFS by SGI) is given in Table 1. Table 1. Write throughput in MB/s for different file systems and client hardware, varying file sizes and fixed record size of 8 MB SunFire with IB p630 (AIX) SNFS CXFS SNFS CXFS SAN-FS 64 MB GB GB The measurements are done using the benchmark software IOzone [14]. Write performance was always measured such that the client waits for the controller to confirm the data transfer before the next block is written. This corresponds to the behavior of NFS mounted with option sync. Due to compatibility issues it was not possible to install the SAN-FS client on our SunFire nodes with InfiniBand and Opteron processors. The reported values rely on sequentially written files of various size 128 kb to 4 GB, doubling size in each step while the record size goes from 64 kb to 8 MB. Typically a monotonic increase of data throughput

7 846 O. Schneider et al MByte/s GB 128 MB 8 MB 1 MB record size (kbyte) Fig. 2. Write performance of SNFS on SunFire with InfiniBand using IB-FC bridge with growing file and record size can be observed. This behavior is depicted in Fig. 2. All measurements are done using a disk-storage system by Data Direct Networks (S2A 8500). The connection between SAN fabric and IB-FC bridge or, accordingly, the p630, was a 2 Gigabit FibreChannel link. Thus, the overall bandwidth from the cluster to the storage is limited to the capacity of this link. For file system access with considerable overhead we can not expect more than about 180 MB/s (or, correspondingly, 1.5 GBit/s). Measurements [15] show that SNFS behave well if more than one client access the same file system. 4.2 Parallel Computing Beside serial applications with high data throughput, the application mix on Opus IB contains typical MPI applications from several scientific domains (climate simulation, CFD, solid state physics and quantum chemistry). Parallel floating-point performance which is relevant for the latter applications was benchmarked using HPL [16]. It was compiled and linked on Opus IB using all available C compilers (GCC, Intel, PGI) and the libraries MVAPICH [17] and ATLAS [18]. The latter was compiled with the architecture defaults from the vendor. The tests were performed on up to 18 nodes. Figure 3 shows the measured performance. It scales linearly with the number of processors. The performance per node is quite constant between 3.7 and 3.3 Gflops, which corresponds to about 80% of the peak performance.

8 Opus IB Grid Enabled Opteron Cluster with InfiniBand Interconnect 847 Total performance (Gflops) Performance (%) Number of processors Fig. 3. Total performance of the HPL test RR00L2L2 for maximal problem size. The HPL benchmark was compiled with the GNU C Compiler Performance per processor (Gflops) GNU C Compiler Intel C Compiler PGI C Compiler Problem size Fig. 4. Performance comparison of the HPL test WR00L2L2 on two processors with three different compilers Comparing the compilers, we see that the GNU C compiler performs best in our tests (cf. Fig. 4). However, for small problems (up to size 1000) the actual

9 848 O. Schneider et al. choice of the compiler does not matter. For larger problems (size 18000) the PGI code is about 25% slower while the lag of the Intel compiler is moderate. These results should not be misconstrued in the sense that GCC produces always better code than the commercial competitors. Firstly, real applications do not behave exactly as the benchmark. Secondly, according to our experiences, PGI performs much better with Fortran code (see also [19]). 5 D-Grid The D-Grid initiative aims at design, building and operating a network of distributed, integrated and virtualized high-performance resources and related services which allow processing of large amounts of scientific data and information. D-Grid currently consists of the integration project (DGI) and six community projects in several scientific domains [20]. One work package of the integration project is to build an infrastructure called Core D-Grid. As part of this core infrastructure Opus IB should be accessible via the three middleware layers GT4, Unicore [21] and glite [22]. The GT4 services are the same as for CampusGrid plus an OGSA-DAI interface to an 1 TB mysql database (cf. [23]) and integration into the MDS4 monitoring hierarchy of D-Grid. The integration with glite middleware suffers from a yet missing 64bit port of glite. Thus, 32bit versions of the LCG tools must be used. The jobs submitted via glite are scheduled on the cluster via TORQUE. Unicore also supports TORQUE, so we can use one local job scheduler on the cluster for all three middlewares. Each middleware should run on a separate (virtual) machine which is a submitting host for the local batch system. So far, only few nodes are equipped with the glite worker node software. A complete roll-out of the system with access via all three middlewares (GT4, glite, and Unicore) is scheduled for the first quarter of At that time we will be productive with an extended D-Grid infrastructure (for example, we are adding 32 nodes to Opus IB with two dual core processors each). A detailed report about configuration details and experiences will be subject of a separate publication. Acknowledgments. The authors thank all colleagues for support in daily management Opus IB and other systems and for helpful suggestions regarding our research activities. Great thanks are due to the developers of Open Source software tools we use in our projects. References 1. Institut für Wissenschaftliches Rechnen, Forschungszentrum Karlsruhe: Campus- Grid (2005), 2. Schneider, O.: The project CampusGrid (NUG-XVI General Meeting, Kiel (May 24-27, 2004)

10 Opus IB Grid Enabled Opteron Cluster with InfiniBand Interconnect Institut für Wissenschaftliches Rechnen, Forschungszentrum Karlsruhe: Grid Computing Centre Karlsruhe (GridKa) (2005), 4. InfiniBand Trade Association: InfiniBand Architecture (2006), 5. Schwickerath, U., Heiss, A.: First experiences with the InfiniBand interconnect. Nuclear Instruments and Methods in Physics Research A 534, (2004) 6. Cluster Resources Inc.: TORQUE Resource Manager (2006), 7. Cluster Resources Inc.: Maui cluster scheduler (2006), 8. Quattor development team: Quattor. System administration toolsuite (2006), 9. The LAM team: Password Storage and Retrieval System (2006), Advanced Digital Information Corporation (ADIC): StorNext File System (2006), Foster, I.T.: Globus Toolkit Version 4: Software for Service-Oriented Systems. In: Jin, H., Reed, D., Jiang, W. (eds.) NPC LNCS, vol. 3779, pp Springer, Heidelberg (2005) 12. The Globus Alliance: Globus Toolkit 4.0 Release Manuals (2006), The Ganglia Development Team: Ganglia (2006), Capps, D.: IOzone Filesystem Benchmark (2006), Schmitz, F., Schneider, O.: The CampusGrid test bed at Forschungszentrum Karlsruhe (NUG-XVII General Meeting, Exeter, GB, May 25-27, 2005) 16. Petitet, A., Whaley, R.C., Dongarra, J., Cleary, A.: HPL A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers (2004), Network-Based Computing Laboratory, Dept. of Computer Science and Engg., The Ohio State University. MVAPICH: MPI over InfiniBand project (2006), Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimization of software and the ATLAS project. Parallel Computing 27(1-2), 3 35 (2001), see also Research Computing, Information Technology Services, University of North Carolina: Compiler Benchmark for LINUX Cluster (2006), bench.html 20. The D-Grid Initiative: D-Grid, an e-science-framework for Germany (2006), The Unicore Project: UNICORE (Uniform Interface to Computing Resources) (2005), The EGEE Project: glite. Lightweight Middleware for Grid Computing (2006), Jejkal, T.: Nutzung der OGSA-DAI Installation auf dem Kern-D-Grid (in German) (2006), GRID/DGrid/Kern-D-Grid.html

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss Site Report Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss Grid Computing Centre Karlsruhe (GridKa) Forschungszentrum Karlsruhe Institute for Scientific Computing Hermann-von-Helmholtz-Platz

More information

The Optimal CPU and Interconnect for an HPC Cluster

The Optimal CPU and Interconnect for an HPC Cluster 5. LS-DYNA Anwenderforum, Ulm 2006 Cluster / High Performance Computing I The Optimal CPU and Interconnect for an HPC Cluster Andreas Koch Transtec AG, Tübingen, Deutschland F - I - 15 Cluster / High Performance

More information

Parallel File Systems Compared

Parallel File Systems Compared Parallel File Systems Compared Computing Centre (SSCK) University of Karlsruhe, Germany Laifer@rz.uni-karlsruhe.de page 1 Outline» Parallel file systems (PFS) Design and typical usage Important features

More information

A short Overview of KIT and SCC and the Future of Grid Computing

A short Overview of KIT and SCC and the Future of Grid Computing A short Overview of KIT and SCC and the Future of Grid Computing Frank Schmitz@kit.edu Die Kooperation von Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) Agenda Introduction KIT and SCC,

More information

Cluster Network Products

Cluster Network Products Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster

More information

Heterogeneous Grid Computing: Issues and Early Benchmarks

Heterogeneous Grid Computing: Issues and Early Benchmarks Heterogeneous Grid Computing: Issues and Early Benchmarks Eamonn Kenny 1, Brian Coghlan 1, George Tsouloupas 2, Marios Dikaiakos 2, John Walsh 1, Stephen Childs 1, David O Callaghan 1, and Geoff Quigley

More information

Grid Computing Middleware. Definitions & functions Middleware components Globus glite

Grid Computing Middleware. Definitions & functions Middleware components Globus glite Seminar Review 1 Topics Grid Computing Middleware Grid Resource Management Grid Computing Security Applications of SOA and Web Services Semantic Grid Grid & E-Science Grid Economics Cloud Computing 2 Grid

More information

NUSGRID a computational grid at NUS

NUSGRID a computational grid at NUS NUSGRID a computational grid at NUS Grace Foo (SVU/Academic Computing, Computer Centre) SVU is leading an initiative to set up a campus wide computational grid prototype at NUS. The initiative arose out

More information

QLogic TrueScale InfiniBand and Teraflop Simulations

QLogic TrueScale InfiniBand and Teraflop Simulations WHITE Paper QLogic TrueScale InfiniBand and Teraflop Simulations For ANSYS Mechanical v12 High Performance Interconnect for ANSYS Computer Aided Engineering Solutions Executive Summary Today s challenging

More information

Users and utilization of CERIT-SC infrastructure

Users and utilization of CERIT-SC infrastructure Users and utilization of CERIT-SC infrastructure Equipment CERIT-SC is an integral part of the national e-infrastructure operated by CESNET, and it leverages many of its services (e.g. management of user

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Computing Technology LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton

More information

Operating two InfiniBand grid clusters over 28 km distance

Operating two InfiniBand grid clusters over 28 km distance Operating two InfiniBand grid clusters over 28 km distance Sabine Richling, Steffen Hau, Heinz Kredel, Hans-Günther Kruse IT-Center University of Heidelberg, Germany IT-Center University of Mannheim, Germany

More information

Integration of Cloud and Grid Middleware at DGRZR

Integration of Cloud and Grid Middleware at DGRZR D- of International Symposium on Computing 2010 Stefan Freitag Robotics Research Institute Dortmund University of Technology March 12, 2010 Overview D- 1 D- Resource Center Ruhr 2 Clouds in the German

More information

Grid Scheduling Architectures with Globus

Grid Scheduling Architectures with Globus Grid Scheduling Architectures with Workshop on Scheduling WS 07 Cetraro, Italy July 28, 2007 Ignacio Martin Llorente Distributed Systems Architecture Group Universidad Complutense de Madrid 1/38 Contents

More information

Parallel File Systems for HPC

Parallel File Systems for HPC Introduction to Scuola Internazionale Superiore di Studi Avanzati Trieste November 2008 Advanced School in High Performance and Grid Computing Outline 1 The Need for 2 The File System 3 Cluster & A typical

More information

The University of Oxford campus grid, expansion and integrating new partners. Dr. David Wallom Technical Manager

The University of Oxford campus grid, expansion and integrating new partners. Dr. David Wallom Technical Manager The University of Oxford campus grid, expansion and integrating new partners Dr. David Wallom Technical Manager Outline Overview of OxGrid Self designed components Users Resources, adding new local or

More information

ABySS Performance Benchmark and Profiling. May 2010

ABySS Performance Benchmark and Profiling. May 2010 ABySS Performance Benchmark and Profiling May 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC

More information

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1 Acknowledgements: Petra Kogel Sami Saarinen Peter Towers 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1 Motivation Opteron and P690+ clusters MPI communications IFS Forecast Model IFS 4D-Var

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

Experiences with HP SFS / Lustre in HPC Production

Experiences with HP SFS / Lustre in HPC Production Experiences with HP SFS / Lustre in HPC Production Computing Centre (SSCK) University of Karlsruhe Laifer@rz.uni-karlsruhe.de page 1 Outline» What is HP StorageWorks Scalable File Share (HP SFS)? A Lustre

More information

Design and Evaluation of a 2048 Core Cluster System

Design and Evaluation of a 2048 Core Cluster System Design and Evaluation of a 2048 Core Cluster System, Torsten Höfler, Torsten Mehlan and Wolfgang Rehm Computer Architecture Group Department of Computer Science Chemnitz University of Technology December

More information

Optimizing LS-DYNA Productivity in Cluster Environments

Optimizing LS-DYNA Productivity in Cluster Environments 10 th International LS-DYNA Users Conference Computing Technology Optimizing LS-DYNA Productivity in Cluster Environments Gilad Shainer and Swati Kher Mellanox Technologies Abstract Increasing demand for

More information

A Case for High Performance Computing with Virtual Machines

A Case for High Performance Computing with Virtual Machines A Case for High Performance Computing with Virtual Machines Wei Huang*, Jiuxing Liu +, Bulent Abali +, and Dhabaleswar K. Panda* *The Ohio State University +IBM T. J. Waston Research Center Presentation

More information

Designing High Performance Communication Middleware with Emerging Multi-core Architectures

Designing High Performance Communication Middleware with Emerging Multi-core Architectures Designing High Performance Communication Middleware with Emerging Multi-core Architectures Dhabaleswar K. (DK) Panda Department of Computer Science and Engg. The Ohio State University E-mail: panda@cse.ohio-state.edu

More information

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

Advanced School in High Performance and GRID Computing November Introduction to Grid computing. 1967-14 Advanced School in High Performance and GRID Computing 3-14 November 2008 Introduction to Grid computing. TAFFONI Giuliano Osservatorio Astronomico di Trieste/INAF Via G.B. Tiepolo 11 34131 Trieste

More information

Introduction to Cluster Computing

Introduction to Cluster Computing Introduction to Cluster Computing Prabhaker Mateti Wright State University Dayton, Ohio, USA Overview High performance computing High throughput computing NOW, HPC, and HTC Parallel algorithms Software

More information

CPU Performance/Power Measurements at the Grid Computing Centre Karlsruhe

CPU Performance/Power Measurements at the Grid Computing Centre Karlsruhe CPU Performance/Power Measurements at the Grid Computing Centre Karlsruhe SPEC Colloquium, Dresden, 2007-06-22 Manfred Alef Forschungszentrum Karlsruhe Institute for Scientific Computing Hermann-von-Helmholtz-Platz

More information

Reduces latency and buffer overhead. Messaging occurs at a speed close to the processors being directly connected. Less error detection

Reduces latency and buffer overhead. Messaging occurs at a speed close to the processors being directly connected. Less error detection Switching Operational modes: Store-and-forward: Each switch receives an entire packet before it forwards it onto the next switch - useful in a general purpose network (I.e. a LAN). usually, there is a

More information

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Qi Gao, Weikuan Yu, Wei Huang, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of Computer Science & Engineering

More information

HPC learning using Cloud infrastructure

HPC learning using Cloud infrastructure HPC learning using Cloud infrastructure Florin MANAILA IT Architect florin.manaila@ro.ibm.com Cluj-Napoca 16 March, 2010 Agenda 1. Leveraging Cloud model 2. HPC on Cloud 3. Recent projects - FutureGRID

More information

InfoBrief. Platform ROCKS Enterprise Edition Dell Cluster Software Offering. Key Points

InfoBrief. Platform ROCKS Enterprise Edition Dell Cluster Software Offering. Key Points InfoBrief Platform ROCKS Enterprise Edition Dell Cluster Software Offering Key Points High Performance Computing Clusters (HPCC) offer a cost effective, scalable solution for demanding, compute intensive

More information

Clusters. Rob Kunz and Justin Watson. Penn State Applied Research Laboratory

Clusters. Rob Kunz and Justin Watson. Penn State Applied Research Laboratory Clusters Rob Kunz and Justin Watson Penn State Applied Research Laboratory rfk102@psu.edu Contents Beowulf Cluster History Hardware Elements Networking Software Performance & Scalability Infrastructure

More information

Parallel File Systems. John White Lawrence Berkeley National Lab

Parallel File Systems. John White Lawrence Berkeley National Lab Parallel File Systems John White Lawrence Berkeley National Lab Topics Defining a File System Our Specific Case for File Systems Parallel File Systems A Survey of Current Parallel File Systems Implementation

More information

Grid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen

Grid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen Grid Computing 7700 Fall 2005 Lecture 5: Grid Architecture and Globus Gabrielle Allen allen@bit.csc.lsu.edu http://www.cct.lsu.edu/~gallen Concrete Example I have a source file Main.F on machine A, an

More information

Parallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein

Parallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein Parallel & Cluster Computing cs 6260 professor: elise de doncker by: lina hussein 1 Topics Covered : Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster

More information

Shared Object-Based Storage and the HPC Data Center

Shared Object-Based Storage and the HPC Data Center Shared Object-Based Storage and the HPC Data Center Jim Glidewell High Performance Computing BOEING is a trademark of Boeing Management Company. Computing Environment Cray X1 2 Chassis, 128 MSPs, 1TB memory

More information

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac. g-eclipse A Framework for Accessing Grid Infrastructures Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.cy) EGEE Training the Trainers May 6 th, 2009 Outline Grid Reality The Problem g-eclipse

More information

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments LCI HPC Revolution 2005 26 April 2005 Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments Matthew Woitaszek matthew.woitaszek@colorado.edu Collaborators Organizations National

More information

Performance Analysis and Prediction for distributed homogeneous Clusters

Performance Analysis and Prediction for distributed homogeneous Clusters Performance Analysis and Prediction for distributed homogeneous Clusters Heinz Kredel, Hans-Günther Kruse, Sabine Richling, Erich Strohmaier IT-Center, University of Mannheim, Germany IT-Center, University

More information

Performance Report: Multiprotocol Performance Test of VMware ESX 3.5 on NetApp Storage Systems

Performance Report: Multiprotocol Performance Test of VMware ESX 3.5 on NetApp Storage Systems NETAPP TECHNICAL REPORT Performance Report: Multiprotocol Performance Test of VMware ESX 3.5 on NetApp Storage Systems A Performance Comparison Study of FC, iscsi, and NFS Protocols Jack McLeod, NetApp

More information

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Torben Kling-Petersen, PhD Presenter s Name Principle Field Title andengineer Division HPC &Cloud LoB SunComputing Microsystems

More information

Picking the right number of targets per server for BeeGFS. Jan Heichler March 2015 v1.3

Picking the right number of targets per server for BeeGFS. Jan Heichler March 2015 v1.3 Picking the right number of targets per server for BeeGFS Jan Heichler March 2015 v1.3 Picking the right number of targets per server for BeeGFS 2 Abstract In this paper we will show the performance of

More information

Introduction to High-Performance Computing

Introduction to High-Performance Computing Introduction to High-Performance Computing Dr. Axel Kohlmeyer Associate Dean for Scientific Computing, CST Associate Director, Institute for Computational Science Assistant Vice President for High-Performance

More information

University at Buffalo Center for Computational Research

University at Buffalo Center for Computational Research University at Buffalo Center for Computational Research The following is a short and long description of CCR Facilities for use in proposals, reports, and presentations. If desired, a letter of support

More information

MPI History. MPI versions MPI-2 MPICH2

MPI History. MPI versions MPI-2 MPICH2 MPI versions MPI History Standardization started (1992) MPI-1 completed (1.0) (May 1994) Clarifications (1.1) (June 1995) MPI-2 (started: 1995, finished: 1997) MPI-2 book 1999 MPICH 1.2.4 partial implemention

More information

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System RAIDIX Data Storage Solution Clustered Data Storage Based on the RAIDIX Software and GPFS File System 2017 Contents Synopsis... 2 Introduction... 3 Challenges and the Solution... 4 Solution Architecture...

More information

SuperMike-II Launch Workshop. System Overview and Allocations

SuperMike-II Launch Workshop. System Overview and Allocations : System Overview and Allocations Dr Jim Lupo CCT Computational Enablement jalupo@cct.lsu.edu SuperMike-II: Serious Heterogeneous Computing Power System Hardware SuperMike provides 442 nodes, 221TB of

More information

MPI versions. MPI History

MPI versions. MPI History MPI versions MPI History Standardization started (1992) MPI-1 completed (1.0) (May 1994) Clarifications (1.1) (June 1995) MPI-2 (started: 1995, finished: 1997) MPI-2 book 1999 MPICH 1.2.4 partial implemention

More information

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to

More information

Experiences with the Parallel Virtual File System (PVFS) in Linux Clusters

Experiences with the Parallel Virtual File System (PVFS) in Linux Clusters Experiences with the Parallel Virtual File System (PVFS) in Linux Clusters Kent Milfeld, Avijit Purkayastha, Chona Guiang Texas Advanced Computing Center The University of Texas Austin, Texas USA Abstract

More information

GRAIL Grid Access and Instrumentation Tool

GRAIL Grid Access and Instrumentation Tool 2007 German e-science Available online at http://www.ges2007.de This document is under the terms of the CC-BY-NC-ND Creative Commons Attribution GRAIL Grid Access and Instrumentation Tool T. Jejkal 1,

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

MONTE CARLO SIMULATION FOR RADIOTHERAPY IN A DISTRIBUTED COMPUTING ENVIRONMENT

MONTE CARLO SIMULATION FOR RADIOTHERAPY IN A DISTRIBUTED COMPUTING ENVIRONMENT The Monte Carlo Method: Versatility Unbounded in a Dynamic Computing World Chattanooga, Tennessee, April 17-21, 2005, on CD-ROM, American Nuclear Society, LaGrange Park, IL (2005) MONTE CARLO SIMULATION

More information

ELFms industrialisation plans

ELFms industrialisation plans ELFms industrialisation plans CERN openlab workshop 13 June 2005 German Cancio CERN IT/FIO http://cern.ch/elfms ELFms industrialisation plans, 13/6/05 Outline Background What is ELFms Collaboration with

More information

A Long-distance InfiniBand Interconnection between two Clusters in Production Use

A Long-distance InfiniBand Interconnection between two Clusters in Production Use A Long-distance InfiniBand Interconnection between two Clusters in Production Use Sabine Richling, Steffen Hau, Heinz Kredel, Hans-Günther Kruse IT-Center, University of Heidelberg, Germany IT-Center,

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

An Introduction to GPFS

An Introduction to GPFS IBM High Performance Computing July 2006 An Introduction to GPFS gpfsintro072506.doc Page 2 Contents Overview 2 What is GPFS? 3 The file system 3 Application interfaces 4 Performance and scalability 4

More information

On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers

On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers CHEP 2016 - San Francisco, United States of America Gunther Erli, Frank Fischer, Georg Fleig, Manuel Giffels, Thomas

More information

To Infiniband or Not Infiniband, One Site s s Perspective. Steve Woods MCNC

To Infiniband or Not Infiniband, One Site s s Perspective. Steve Woods MCNC To Infiniband or Not Infiniband, One Site s s Perspective Steve Woods MCNC 1 Agenda Infiniband background Current configuration Base Performance Application performance experience Future Conclusions 2

More information

Rechenzentrum HIGH PERFORMANCE SCIENTIFIC COMPUTING

Rechenzentrum HIGH PERFORMANCE SCIENTIFIC COMPUTING Rechenzentrum HIGH PERFORMANCE SCIENTIFIC COMPUTING Contents Scientifi c Supercomputing Center Karlsruhe (SSCK)... 4 Consultation and Support... 5 HP XC 6000 Cluster at the SSC Karlsruhe... 6 Architecture

More information

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE DELL EMC ISILON F800 AND H600 I/O PERFORMANCE ABSTRACT This white paper provides F800 and H600 performance data. It is intended for performance-minded administrators of large compute clusters that access

More information

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine April 2007 Part No 820-1270-11 Revision 1.1, 4/18/07

More information

Our new HPC-Cluster An overview

Our new HPC-Cluster An overview Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization

More information

Scientific Computing with UNICORE

Scientific Computing with UNICORE Scientific Computing with UNICORE Dirk Breuer, Dietmar Erwin Presented by Cristina Tugurlan Outline Introduction Grid Computing Concepts Unicore Arhitecture Unicore Capabilities Unicore Globus Interoperability

More information

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays Dell EqualLogic Best Practices Series Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays A Dell Technical Whitepaper Jerry Daugherty Storage Infrastructure

More information

Evaluating Algorithms for Shared File Pointer Operations in MPI I/O

Evaluating Algorithms for Shared File Pointer Operations in MPI I/O Evaluating Algorithms for Shared File Pointer Operations in MPI I/O Ketan Kulkarni and Edgar Gabriel Parallel Software Technologies Laboratory, Department of Computer Science, University of Houston {knkulkarni,gabriel}@cs.uh.edu

More information

Edinburgh (ECDF) Update

Edinburgh (ECDF) Update Edinburgh (ECDF) Update Wahid Bhimji On behalf of the ECDF Team HepSysMan,10 th June 2010 Edinburgh Setup Hardware upgrades Progress in last year Current Issues June-10 Hepsysman Wahid Bhimji - ECDF 1

More information

Introduction to Grid Computing

Introduction to Grid Computing Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able

More information

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience Jithin Jose, Mingzhe Li, Xiaoyi Lu, Krishna Kandalla, Mark Arnold and Dhabaleswar K. (DK) Panda Network-Based Computing Laboratory

More information

IBM Virtual Fabric Architecture

IBM Virtual Fabric Architecture IBM Virtual Fabric Architecture Seppo Kemivirta Product Manager Finland IBM System x & BladeCenter 2007 IBM Corporation Five Years of Durable Infrastructure Foundation for Success BladeCenter Announced

More information

M. Roehrig, Sandia National Laboratories. Philipp Wieder, Research Centre Jülich Nov 2002

M. Roehrig, Sandia National Laboratories. Philipp Wieder, Research Centre Jülich Nov 2002 Category: INFORMATIONAL Grid Scheduling Dictionary WG (SD-WG) M. Roehrig, Sandia National Laboratories Wolfgang Ziegler, Fraunhofer-Institute for Algorithms and Scientific Computing Philipp Wieder, Research

More information

CS500 SMARTER CLUSTER SUPERCOMPUTERS

CS500 SMARTER CLUSTER SUPERCOMPUTERS CS500 SMARTER CLUSTER SUPERCOMPUTERS OVERVIEW Extending the boundaries of what you can achieve takes reliable computing tools matched to your workloads. That s why we tailor the Cray CS500 cluster supercomputer

More information

NetApp High-Performance Storage Solution for Lustre

NetApp High-Performance Storage Solution for Lustre Technical Report NetApp High-Performance Storage Solution for Lustre Solution Design Narjit Chadha, NetApp October 2014 TR-4345-DESIGN Abstract The NetApp High-Performance Storage Solution (HPSS) for Lustre,

More information

MOHA: Many-Task Computing Framework on Hadoop

MOHA: Many-Task Computing Framework on Hadoop Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction

More information

and the GridKa mass storage system Jos van Wezel / GridKa

and the GridKa mass storage system Jos van Wezel / GridKa and the GridKa mass storage system / GridKa [Tape TSM] staging server 2 Introduction Grid storage and storage middleware dcache h and TSS TSS internals Conclusion and further work 3 FZK/GridKa The GridKa

More information

Moab Workload Manager on Cray XT3

Moab Workload Manager on Cray XT3 Moab Workload Manager on Cray XT3 presented by Don Maxwell (ORNL) Michael Jackson (Cluster Resources, Inc.) MOAB Workload Manager on Cray XT3 Why MOAB? Requirements Features Support/Futures 2 Why Moab?

More information

Grid Computing Competence Center Large Scale Computing Infrastructures (MINF 4526 HS2011)

Grid Computing Competence Center Large Scale Computing Infrastructures (MINF 4526 HS2011) Grid Computing Competence Center Large Scale Computing Infrastructures (MINF 4526 HS2011) Sergio Maffioletti Grid Computing Competence Centre, University of Zurich http://www.gc3.uzh.ch/

More information

DTM a lightweight computing virtualization system based on irods. Yonny Cardenas, Pascal Calvat, Jean-Yves Nief, Thomas Kachelhoffer

DTM a lightweight computing virtualization system based on irods. Yonny Cardenas, Pascal Calvat, Jean-Yves Nief, Thomas Kachelhoffer DTM a lightweight computing virtualization system based on irods Yonny Cardenas, Pascal Calvat, Jean-Yves Nief, Thomas Kachelhoffer IRODS User Group Meeting, Tucson, AZ, USA, Mars 1-2, 2012 Overview Introduction

More information

PoS(EGICF12-EMITC2)081

PoS(EGICF12-EMITC2)081 University of Oslo, P.b.1048 Blindern, N-0316 Oslo, Norway E-mail: aleksandr.konstantinov@fys.uio.no Martin Skou Andersen Niels Bohr Institute, Blegdamsvej 17, 2100 København Ø, Denmark E-mail: skou@nbi.ku.dk

More information

Eclipse Technology Project: g-eclipse

Eclipse Technology Project: g-eclipse (Incubation) Document classification: Made available under the Eclipse Public License v1.0. Date: September 11, 2007 Abstract: This document contains the Release Review Documentation for the Eclipse Technology

More information

dcache Introduction Course

dcache Introduction Course GRIDKA SCHOOL 2013 KARLSRUHER INSTITUT FÜR TECHNOLOGIE KARLSRUHE August 29, 2013 dcache Introduction Course Overview Chapters I, II and Ⅴ christoph.anton.mitterer@lmu.de I. Introduction To dcache Slide

More information

Affordable and power efficient computing for high energy physics: CPU and FFT benchmarks of ARM processors

Affordable and power efficient computing for high energy physics: CPU and FFT benchmarks of ARM processors Affordable and power efficient computing for high energy physics: CPU and FFT benchmarks of ARM processors Mitchell A Cox, Robert Reed and Bruce Mellado School of Physics, University of the Witwatersrand.

More information

Hitachi Converged Platform for Oracle

Hitachi Converged Platform for Oracle Hitachi Converged Platform for Oracle Manfred Drozd, Benchware Ltd. Sponsored by Hitachi Data Systems Corporation Introduction Because of their obvious advantages, engineered platforms are becoming increasingly

More information

HPC Solution. Technology for a New Era in Computing

HPC Solution. Technology for a New Era in Computing HPC Solution Technology for a New Era in Computing TEL IN HPC & Storage.. 20 years of changing with Technology Complete Solution Integrators for Select Verticals Mechanical Design & Engineering High Performance

More information

LiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster

LiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster LiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster H. W. Jin, S. Sur, L. Chai, and D. K. Panda Network-Based Computing Laboratory Department of Computer Science and Engineering

More information

Subject: Request for proposal for a high-performance computing cluster

Subject: Request for proposal for a high-performance computing cluster Ref. No.: SSCU/GPR/T/2017-79 2 May 2017 Subject: Request for proposal for a high-performance computing cluster Dear Madam/Sir, I wish to purchase a high-performance computing (HPC) cluster comprising of

More information

INFOBrief. Dell-IBRIX Cluster File System Solution. Key Points

INFOBrief. Dell-IBRIX Cluster File System Solution. Key Points INFOBrief Dell-IBRIX Cluster File System Solution High-performance parallel, segmented file system for scale-out clusters, grid computing, and enterprise applications Capable of delivering linear scalability

More information

CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart

CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavier Besseron, Hao Wang, Jian Huang, Dhabaleswar K. Panda Department of Computer

More information

Resource Management on a Mixed Processor Linux Cluster. Haibo Wang. Mississippi Center for Supercomputing Research

Resource Management on a Mixed Processor Linux Cluster. Haibo Wang. Mississippi Center for Supercomputing Research Resource Management on a Mixed Processor Linux Cluster Haibo Wang Mississippi Center for Supercomputing Research Many existing clusters were built as a small test-bed for small group of users and then

More information

Architecting Storage for Semiconductor Design: Manufacturing Preparation

Architecting Storage for Semiconductor Design: Manufacturing Preparation White Paper Architecting Storage for Semiconductor Design: Manufacturing Preparation March 2012 WP-7157 EXECUTIVE SUMMARY The manufacturing preparation phase of semiconductor design especially mask data

More information

APENet: LQCD clusters a la APE

APENet: LQCD clusters a la APE Overview Hardware/Software Benchmarks Conclusions APENet: LQCD clusters a la APE Concept, Development and Use Roberto Ammendola Istituto Nazionale di Fisica Nucleare, Sezione Roma Tor Vergata Centro Ricerce

More information

StorNext SAN on an X1

StorNext SAN on an X1 StorNext SAN on an X1 Jim Glidewell, Boeing Shared Services Group ABSTRACT: With the acquisition of a Cray X1 in January 2004, Boeing needed a replacement HSM (Hierarchical Storage Management) system which

More information

ACET s e-research Activities

ACET s e-research Activities 18 June 2008 1 Computing Resources 2 Computing Resources Scientific discovery and advancement of science through advanced computing Main Research Areas Computational Science Middleware Technologies for

More information

Cray events. ! Cray User Group (CUG): ! Cray Technical Workshop Europe:

Cray events. ! Cray User Group (CUG): ! Cray Technical Workshop Europe: Cray events! Cray User Group (CUG):! When: May 16-19, 2005! Where: Albuquerque, New Mexico - USA! Registration: reserved to CUG members! Web site: http://www.cug.org! Cray Technical Workshop Europe:! When:

More information

FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D Jülich, Tel. (02461) Core D-Grid Infrastructure

FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D Jülich, Tel. (02461) Core D-Grid Infrastructure FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D-52425 Jülich, Tel. (02461) 61-6402 Technical Report Core D-Grid Infrastructure Thomas Fieseler, Wolfgang Gürich FZJ-ZAM-IB-2007-09

More information

ANALYSIS OF CLUSTER INTERCONNECTION NETWORK TOPOLOGIES

ANALYSIS OF CLUSTER INTERCONNECTION NETWORK TOPOLOGIES ANALYSIS OF CLUSTER INTERCONNECTION NETWORK TOPOLOGIES Sergio N. Zapata, David H. Williams and Patricia A. Nava Department of Electrical and Computer Engineering The University of Texas at El Paso El Paso,

More information

Real Parallel Computers

Real Parallel Computers Real Parallel Computers Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005 Short history

More information

Building Data-Intensive Grid Applications with Globus Toolkit An Evaluation Based on Web Crawling

Building Data-Intensive Grid Applications with Globus Toolkit An Evaluation Based on Web Crawling Building Data-Intensive Grid Applications with Globus Toolkit An Evaluation Based on Web Crawling Andreas Walter 1, Klemens Böhm 2, and Stephan Schosser 2 1 IPE, FZI Forschungszentrum Informatik, Haid-und-Neu-Straße

More information

ISTITUTO NAZIONALE DI FISICA NUCLEARE

ISTITUTO NAZIONALE DI FISICA NUCLEARE ISTITUTO NAZIONALE DI FISICA NUCLEARE Sezione di Perugia INFN/TC-05/10 July 4, 2005 DESIGN, IMPLEMENTATION AND CONFIGURATION OF A GRID SITE WITH A PRIVATE NETWORK ARCHITECTURE Leonello Servoli 1,2!, Mirko

More information

Comparing Ethernet & Soft RoCE over 1 Gigabit Ethernet

Comparing Ethernet & Soft RoCE over 1 Gigabit Ethernet Comparing Ethernet & Soft RoCE over 1 Gigabit Ethernet Gurkirat Kaur, Manoj Kumar 1, Manju Bala 2 1 Department of Computer Science & Engineering, CTIEMT Jalandhar, Punjab, India 2 Department of Electronics

More information