A Long-distance InfiniBand Interconnection between two Clusters in Production Use

Size: px
Start display at page:

Download "A Long-distance InfiniBand Interconnection between two Clusters in Production Use"

Transcription

1 A Long-distance InfiniBand Interconnection between two Clusters in Production Use Sabine Richling IT-Center University of Heidelberg 6912 Heidelberg, Germany Heinz Kredel IT-Center University of Mannheim Mannheim, Germany ABSTRACT We discuss operational and organizational issues of an InfiniBand interconnection between two clusters over a distance of 28 km in day-to-day production use. We describe the setup of hardware and networking components, and the solution of technical integration problems. Then we present solutions for a federated authorization system for the cluster within our two participating universities and other organizational integration problems. Performance measurements for MPI communication and file access to Lustre storage systems are presented. The results and a simple performance model show that MPI performance is intrinsically poor across the long-distance interconnection with limited bandwidth. However, file access and MPI communication among nodes on each side are barely affected by the limitations of the interconnection even at high load. Our organizational and technical setup allows the operation of the two clusters as a single system with lower administration costs and a better load balance than in a disconnected setup. Categories and Subject Descriptors C.4 [Performance of Systems]: Reliability, availability, and serviceability General Terms Performance, Management Copyright is held by the author/owner(s) SC 11, November 12-18,211,Seattle,Washington,USA. ACM /11/11 Steffen Hau IT-Center University of Mannheim Mannheim, Germany hau@rz.unimannheim.de Hans-Günther Kruse IT-Center University of Mannheim Mannheim, Germany kruse@rz.unimannheim.de Keywords operating clusters, long-distance InfiniBand, performance model 1. INTRODUCTION The IT-centers of the universities of the German federal state Baden-Württemberg face an increasing demand for high-performance computing capacities from various scientific communities. These demands are, however, not high enough to qualify for the top German HPC centers in Jülich, Munich or Stuttgart. In order to improve the availability of such compute resources, the HPC center of Stuttgart wrote a proposal for a grid infrastructure concept for the universities. The concept favoured a distributed computing resource near the prospective users. As an improvement of the concept, the IT-centers of Heidelberg and Mannheim planned to connect their two clusters to ease the operation and improve resource utilization. Both IT-centers have a long record of cooperations and are connected by a 1 Gbit capable dark fibre connection, with two 1 Gbit color lines already used for backup and other services. In this report we describe the setup of the two clusters and the concept of the InfiniBand connection over Ethernet over fibre optics. We show that by our concept the two clusters can be operated as a single system with low administration costs. The single batch system enforces that all nodes for a job are allocated on one side of the cluster. By this we optimize MPI performance which would be insufficient for communication between nodes on opposite sides of the 28 km connection. We developed a performance model to verify this. 1.1 Related Work The performance of InfiniBand for SAN and WAN cluster connections has been studied in [2], [25], [22], and [1]. To our knowledge there are no reports about the suitability and stability of such connections in day-to-day production use. There are various attempts for software architectures and setup of meta-scheduling for geographically distant grid

2 clusters [2, 1, 4, 8, 5] which we have avoided by our approach. Several national Grid initiatives, like the German D-Grid, also address the problem of a meta user administration. This is resolved mainly by a PKI infra-structure (operated by DFN, German research network) with certified and approved users in virtual organizations (VOs) and certified sites organized as resource providers. In our case we use connections to local established user administration, authentication and authorization together with the D-Grid PKI to authenticate and authorize D-Grid users. We have not studied usability, security and other issues of the PKI infra-structure in our setting. We also have not investigated the suitability of Shibboleth [23], the DFN-AAI [3], Moonshot [16], or Eduroam [6] for user administration between Mannheim and Heidelberg up to now. This is an ongoing research topic within bwgrid. Our performance modeling is based on the Linpack benchmark with its specific computational workload and communication patterns. For the study of other application workloads in a wide area setting see for example [19] or [15]. Parts of this report are published in [21]. 1.2 Outline of the report In section 2 we introduce the bwgrid cooperation project. Then section 3 describes the setup of the two clusters and the InfiniBand fiber optic connection between both. The challenges in operating two geographically separated clusters as a single system are discussed in section 4. MPI performance measurements and a model for that as well as Lustre performance measurements are presented in section 5. The last section 6 draws some conclusions. 2. BWGRID COOPERATION Since September 25 the German grid activities are bundled by the D-Grid Initiative ( Its aim is the development and establishment of a reliable and sustainable grid infras-tructure for e-science in Germany and it has received about 5 million Euro funding by the Federal Ministry of Education and Research (BMBF). Part of these activities is a community project of the universities of Baden-Württemberg (BW) bwgrid ( It consists of compute clusters at the universities in Stuttgart, Ulm (together with Konstanz), Karlsruhe, Tübingen, Freiburg and Mannheim together with Heidelberg. Besides local storage at each site, there is a central storage unit in Karlsruhe. The funding requires all sites to provide access to all D-Grid virtual organizations (VOs) by at least one middleware from the supported D-Grid software stack. So bwgrid architecture is a distributed system with local administration by the IT-Centers of each university. Access to each cluster is provided by local university accounts (via ssh) and by D-Grid certificates with membership of an accredited VO (using grid middleware, e.g. with Globus Toolkit gsissh, GridFTP or Web Services). Local accounts allow only access to local facilities but grid certificates allow access to all sites. Each compute cluster consists of 14 nodes at the sites Karlsruhe, Mannheim, Heidelberg, Tübingen, Freiburg, 28 nodes at Ulm (housing also the nodes of Konstanz) and 42 nodes at Stuttgart. The central storage, located at Karlsruhe, consists of 128 TB with backup and 256 TB without backup. The common software available at each site is a Scientific Linux operating system with PBS batch system and Maui (now Moab) scheduler, the GNU and the Intel compiler suites. This software stack is mainly a requirement by D- Grid but also by bwgrid. There is a central repository for software modules for distribution which consists of several MPI versions, mathematical libraries and various free software. 3. INTERCONNECTION OF TWO BWGRID CLUSTERS In this section we summarize the hardware of the bwgrid clusters in Mannheim and Heidelberg and the interconnect between the two clusters. There are 1 bladecenter in Heidelberg and 1 bladecenter in Mannheim. Each bladecenter contains 14 IBM HS21 XM blades and each blade contains 2 Intel Xeon CPUs, 2.8 GHz (each CPU with 4 Cores) 16 GB Memory 14 GB Hard Drive (since January 29) Gigabit-Ethernet (1 Gbit) InfiniBand Network (2 Gbit) This makes a total of 112 CPU cores on each side. Since May 29 local bwgrid storage systems with 32 TB and Lustre parallel file system are available at both sides. The decision for a Lustre file system and not IBM GPFS or others was made during the procurement for the storage system. The concept for the interconnection of the two clusters was based on a proposal and acquisition procedure in 28 and was assembled in May 29 and is running since July 29. The main technical part is InfiniBand over Ethernet over fibre optics which is provided by the Longbow adaptor from Obsidian (see figure 1, [17]). This adaptor as an InfiniBand connector (black cable) and a fibre optic connector (yellow cable) and does the packaging of the InfiniBand protocol within optical Ethernet. The universities already had a dark fiber connection between the two IT-centers which was used for (file) backup and fast campus connection up to then. An additional component from ADVA had to be used for the transformation of the white light from the Longbow to one color light transmitted over the dark fibre. Figure 1: Obsidian Longbow MPI performance over such an interconnection was measured by our colleagues from HLRS (High performance computing center) Stuttgart for different distances up to km [9]. They suggest that one can expect a bandwidth of 9- MB per second for up to 5-6 km. Latency was not published. Our own measurements find a latency for a local

3 (inner cluster connection) of 2 µsec and 145 µsec over the interconnection to the other site. The bandwidth is 14 MB/sec local and 93 MB/sec over the interconnection (see figure 2). For message sizes of 1 9 byte the remote bandwidth is 5 % lower as the local value. For smaller message sizes, the situation is worse. time [µsec] 1e local MA-HD 1 IMB 3.2 1e3 1e4 PingPong 1e5 1e6 buffer size [bytes] 1e7 1e8 1e9 bandwidth [Mbytes/sec] local MA-HD 1 IMB 3.2 PingPong 1e3 1e4 1e5 1e6 buffer size [bytes] Figure 2: MPI performance Mannheim-Heidelberg Our experiences with the interconnection network can be summarized as follows The cable distance from Mannheim to Heidelberg is 28 km (18 km linear distance in air). Calculating with a speed of light in a fiber optic cable of 65 % of the speed of light in vacuum, the light needs about 143 µsec for this distance. The interconnection latency is high: 145 µsec = light transit time + 2 µsec compared to a local latency of 1.99 µ sec for Point-to-Point communications. The bandwidth is as expected: about 93 MB/sec over the interconnect compared to the local bandwidth of about MB/sec. In the first measurements we observed a bandwidth of only 43 MB/sec. This low number was due to the fact, that Obsidian needed a different license for a 4 km distance and we initially only bought a license for 1 km. The Longbow adapter has internal message buffers suitable for larger distances, but they are only activated with the correct license. The effect is shown in figure 3. bandwidth [Mbytes/sec] Sep : IMB PingPong - buffer size 1 GB 23 Sep : 3 Sep : start time [date hour] 7 Oct : Figure 3: Influence of the Obsidian License 4. CLUSTER OPERATION In this section we discuss the concepts for operating two clusters at different locations and embedded in two different 1e7 1e8 1e9 universities. Contrary to HPC clusters for computer science research in Grid environments, like [5, 8], we have to provide a production environment for scientific communities outside computer science. So we can not operate in a relaxed environment like SGE at [5], but have to ensure strong privacy guarantees where, for example, no user can access the nodes of another user. Further, we couple the user administration as much as it is possible to the local user administrations instead of setting up disjoint user administration as in [5, 8]. This spares maintenance cost on our personal by reusing established university IT administration processes and improves usability by not IT-affine users from out-side, say economics or humanities. We begin with a schematic overview of the components of the two clusters in figure 4. The figure shows the two clusters as (blue) boxes labeled Cluster connected by (orange) lines for the InfiniBand connection. The box Obsidian and ADVA represents the 28 km fibre connection between both sites. The Ethernet connection between all components is not shown, only the 1 Gbit connections to the outside Internet Belwue (BW science net) from the user access servers labeled User on the top are shown. The center (green) oval labeled Admin shows the central admin server with PBS and passwd user administration. The other parts of the user administration is shown in the (yellow) circles: VORM from D-Grid, LDAP from Mannheim and AD from Heidelberg. The LDAP and AD connections are unique to known Grid environments. VORM LDAP Cluster Mannheim User User InfiniBand Lustre bwfs MA Belwue PBS Admin passwd Obsidian + ADVA plus 1GE Admin network User User InfiniBand AD Cluster Heidelberg Lustre bwfs HD VORM Figure 4: Cluster connection components 4.1 Node management The management of the compute nodes is straight forward. The main challenges are the increased security requirements in the production environment, for example, very fast reactions to Linux (kernel) exploits. The cluster is designed to be operated by only one administration server system. The compute nodes have no operating systems installed on their local hard disks, but are booted via PXE and use a NFS read-only export as their root file system instead. This has a great security benefit, as no malicious software can manipulate critical system files and libraries. The administration server provides the NFS

4 export as well as a queuing and scheduling system for the jobs. It also exports a directory via NFS, where site specific and commonly used software is installed. Users can use these software packages very easily with the module utilities (modules.sourceforge.net). They ensure, that the environment contains everything the software needs to run. The installation of new software for the users is very easy and only a special file for the module utilities has to be created in order to use it. The administration and the maintenance of such a cluster is a comprehensive task. Performing firmware, operating system and software updates as well as powering on/off or rebooting compute nodes is no longer possible in a conventional way with such an amount of compute nodes. A collection of shell scripts alleviate this work enormously. The first versions of the scripts were developed by HLRS and were improved and adjusted by all sites to fit their needs. As mentioned in section 3, the compute nodes are IBM HS 21XM blades. The IBM bladecenter technology offers an easy way to manage blades. A management module is installed in each bladecenter and provides two different ways of access. There are a command line interface (cli) via ssh/telnet and a Web-GUI accessible with a web browser. The shell scripts to power on/off and reboot the blades use the cli (via ssh) and dispatch commands to perform the desired action. If a node is crashed, it can be rebooted by executing bldctrl_poweroff_blade <hostname>. The complete cluster can be powered off in the same way by executing bldctrl_poweroff_cluster. Another script doall executes a given shell command on all nodes. A global BIOS update for example is possible by executing update_bios on all nodes. As the files for the OS are provided by a single NFS export, OS upgrades only need to be applied in one place and the compute nodes are immediately up to date. Only kernel and critical system library updates need a reboot. The administration server further provides a DHCP service for the compute nodes. There is no need to manage network configuration files on each blade. For each compute node its individual IP address and appropriate entries in the DHCP configuration file must be defined. The MAC address is the unique criterion to distinguish the nodes. The management module gives a detailed overview of each installed blade in the particular bladecenter, including the MAC addresses. A shell script goes through all available management modules (using the cli), asking them for the MAC addresses of each installed blade and generates the unique entries for the DHCP service. A special naming convention makes the calculation of IP addresses of the compute nodes for the DHCP entries very easy. Each bladecenter provides space for 14 blades and each chassis for 4 bladecenter. So the node name n138 for example is the blade in slot 8 of bladecenter 3 in chassis 1. By this scheme, the position of the blade within the cluster is counted and the result is added to the IP address where the cluster subnet starts. If a count exceeds 256, the preceding IPv4 position is incremented accordingly. 4.2 User management The user management with life-cycle management and user access policies and procedures designed for the coupled Mannheim and Heidelberg clusters is unique in bwgrid and also in other Grid environments. bwgrid currently investigates if a similar federated user management can be setup for all bwgrid partners. The queuing and scheduling systems are configured to give users exclusive access to compute nodes. Several steps must be performed in order to guarantee this. A compute node needs to know the user in order to start a job with the correct privileges. However, no other user should be able to login to compute nodes which are already occupied, so user-ids must essentially be unique for the queuing and scheduling systems. To avoid unauthorized login, we first modified the passwd file on each node allocated to a batch job. If a job starts, the head node will search for the user, who submitted the job in a complete database and add this entry together with a minimal database to an intermediate passwd file. When the job is finished, this intermediate passwd file is overridden by the complete one. However, the handling of these files was unreliable in the case of job crashes. So we replaced this mechanism by a direct connection to PBS for user authorization via PAM with the module pam_pbssimpleauth. This PAM module is installed and configured once on each node. If a user wants to login on a node, PAM asks PBS, if this node is allocated to a job of exactly this user, and allows or denies access accordingly. Simple Perl scripts extract the users together with access privileges, their addresses for the mail system and the groups from a directory service. The directory services at both universities differ fundamentally, but provide access via an LDAP interface. Therefore, the scripts only need tiny modifications to fit the specific conditions and configurations at each university. Directory service MA user unique! user group group +prefix ma +prefix hd LDAP AD user id user id group id group id Adminserver passwd Directory service HD Figure 5: Building configuration files from the information of different directory services Before the interconnection, both clusters had completely separated user administrations and were connected to fundamentally different user management systems. Combining information from both systems leads to problems with duplicate user names and user-ids as well as group names and group-ids (see also figure 5). Since the group attributes are not critical for authentication, we simply add the prefixes hd or ma to the group names and two different offsets to the group-ids. The group math from Mannheim for example with group-id 38 is changed to mamath with group-id 138, the group math from Heidelberg with group-id 219 then is renamed to hdmath with group-id These changes are also applied to the group-id val-

5 ues of each user before the entry is written to the passwd database. In the same way with adding offsets and prefixes, it would be possible to have unique user-ids, as they are only used within the cluster and do not interact with any systems outside the cluster. However, prefixed user-ids are not available in the local directory services. They do not export user passwords either in cleartext nor in encrypted form (it s also not desired to know about passwords as well as ensuring that they are always in sync). The authentication must therefore be performed against these directory services. This makes prefixing user names, in the same way as with groups, impossible. Other possibilities, like authenticating users by adding radius realms and mapping them to cluster unique names, have been discussed together with many people involved in the user management at both universities. All these efforts would create too much overhead and would require too many adjustments in the user management processes. In the end, it was decided to choose the easiest way for achieving unique user names: do not give access to a user, whose user name has already been given access to the cluster at the other university. If such a situation occurs, the applicant must obtain a second user account with a different name, which is then given access to the cluster. Activating an user account for the cluster consist of two steps now: 1. adding a special attribute for the user or adding the user to a special LDAP subtree (for authentication), 2. updating the user database cluster wide (for authorization). 4.3 Job management The interconnection itself provides enough bandwidth for I/O operations on the file systems, but is eventually not sufficient for all kinds of MPI jobs over InfiniBand (see next section for details and the explanation). As there might be work load patterns and experienced users which can tolerate such an environment, we must prepare for the worst case as service provider. That is, to avoid user frustration with insufficient network performance with Linpack like work loads we choose to avoid such a situation. Therefore, a job should only run on nodes which are physically connected to the same switch fabric. A simple way to ensure this is to make use of attributes which the queuing system provides to describe a node. Each node located in Mannheim has the attribute lma, the nodes in Heidelberg lhd respectively. The scheduling system is then configured to schedule jobs only on nodes which have either the lma or the lhd attribute. Before the interconnection, the utilization of both independent clusters varied highly. Most users in Mannheim only occupied single nodes, because their software is mostly not capable of using multiple nodes. In contrast, the users in Heidelberg made heavy use of MPI and therefore used a lot of nodes. This led to jobs waiting in the queues for a long time until enough resources were available. The interconnection made the free resources in Mannheim available to the users in Heidelberg. In figure 6, week 29 and 3 show the utilization in Mannheim before the interconnection and week 31 and 32 after. The Ganglia monitoring system as been deactivated later, since it interfered too much with other jobs. We will consider a better configuration of Ganglia or other eventually less intrusive monitoring systems in the future, for example GridICE or MonALISA. Number of processes and Percent CPU Usage Figure 6: Ganglia monitoring report at activation of the interconnection 5. PERFORMANCE 5.1 MPI performance In this section we investigate the MPI performance of the interconnected clusters. First we address the question whether it is reasonable to run MPI jobs across the interconnection. Since we had no knowledge of the real workload patterns (and the required interconnect performance) of user jobs at the beginning, we used Linpack as a first estimate. Moreover, as an IT service provider we have to ensure that all kinds of workloads can be handled by our resources, we choose Linpack as a worst-case workload with strong communication requirements. So the goal of our performance analysis is not a detailed study of Linpack and its optimization but to obtain answers on questions which we face as a service provider (e.g. buying suitable hardware within cost limits, and operating it). Also, the developed performance model is intended to be very simple, and uses only very general assumptions so that inner effects of the workload can be abstracted away. For the performance analysis we used the High-Performance Linpack (HPL) benchmark [14] with OpenMPI (version 1.2.8) and Intel MKL (version 1.). We measured two variants: s) runs on a single cluster with up to 124 CPU cores, m) runs on the coupled two clusters with up to 248 CPU cores symmetrically distributed. For HPL we used square or nearly square process grids and chose parameters which gave the best results for the (s) runs for the maximum problem size of 4. For the (m) runs we used the same parameters. speed-up HPL 1.a local p/ln(p) number of processors p n p =4 n p =3 n p =2 n p = Figure 7: Speed-up as a function of the number of processors, local Figure 7 shows the speed-up S of the (s) runs in dependence on the number of processors p for different load parameters n p. For HPL n p is the matrix size. Also shown is the speedup predicted by a simple model S M = p/ ln p [13]. The

6 ideal speed-up is given S 1(p) = p for perfectly parallelizable programs. It is clearly visible that for high load n p = 4 for p 256 the ideal speed-up is closely approached. In this case the efficiency ɛ(p) = S(p)/p varies only slightly and indicates that the internal communication has no or only small influence. The simple model all CPU configurations p = 1,..., 124 have equal probability which predicts this speed-up closely matches the measured values. However, it can not predict the influence of the load n (the typical number of instructions) and the communication. At least the communication should reduce the speed-up growing with the number of p. The measurements for the second runs (m) are shown in figure 8. We see reduced speed-up for p > 256 by a factor of 4 compared to variant (s). Moreover the speed-up becomes constant and so indicates a decreasing efficiency ɛ(p). Figure 9 shows a direct comparison of some results from both runs logarithmically scaled to emphasize the behavior of the speed-up for small p. Here we see that an acceptable speedup for applications running across the interconnection can only be expected for p < 5 and if the load is large enough. 8 HPL 1.a MA-HD n p =4 n p =3 n p =2 n p = the high latency and the limited bandwidth of into account [12]. Despite qualifying the model as simple, nobody seems to have studied such a setting with our statistical methods, see e.g. [7, 13]. The performance model shows that the latency of 145 µsec for the InfiniBand connection is not the true limiting parameter, however the bandwidth of 93 MB/sec is the important factor. The single InfiniBand interconnection between the two clusters is a shared medium for p communicating processing units. The bandwidth of the channel cannot be neglected. Considering the values for the latency and bandwidth of the interconnection, the result for the speed-up is S c(p) ln p p 3, (1) n p (1 + 4p)c(p) where c(p) is a dimensionless function which represents the communication topology. The speed-up is also shown in figure 1 for the values n p =, 2, 4. Analyzing the behaviour of this speed-up we observe a good qualitative reproduction of the measured values. Responsible is the additional term 3 4 (/np)3 4p c(p) with c(p) 1 2 p2. The term 3 4 (/np)3 is the influence of the latency. speed-up number of processors p Figure 8: Speed-up as a function of the number of processors, remote HPL 1.a local n p =4 n p = MA-HD n p =4 n p = speed-up 1 Figure 1: Speed-up according to our model number of processors p Figure 9: Speed-up as a function of the number of processors, local and remote Starting from a simple model [13, pp ], we developed a model for the interconnect performance taking the characteristics of the InfiniBand connection over 28 km, namely The model also allows us to predict the results for a larger bandwidth. For example we could double the bandwidth by adding a second fibre line and doubling the equipment. In this case the model shows nearly no further improvements for small problem sizes n p 2 and for n p 4 improvements of only 25%. An improvement of % is only possible with a ten-fold of the available bandwidth which is not worth the financial effort in our situation. We also investigated, if the interconnection influences MPI jobs which only use nodes on one side of the cluster. Therefore, we set up a long-term measurement of the MPI com-

7 munication between two nodes under load conditions. Each time two different nodes two in Mannheim or two in Heidelberg were allocated for the measurements. Figure 11 and figure 12 display the latency and the bandwidth measurements during three months. During this time the average workload of the cluster was about 8 9 %. The results show that the local latency of 2 µsec and the local bandwidth of MB/sec is available most of the time. latency [microsec] IMB 3.2 PingPong buffer size GB rates do not depend on the location of the storage system. Hence, we can conclude that the limited bandwidth of the interconnection is sufficient for file access. That the performance of Lustre across a wide-area network is comparable to the performance of a locally mounted Lustre file system was expected from similar measurements e.g. on TeraGrid using 1 GE [24] and more recently on a testbed using Obsidian Longbow InfiniBand Routing [18] like in our setup. Our measurements were performed during normal workload traffic which confirms the suitability of Lustre over InfiniBand over fibre optics as a local-area file system. Mbytes/sec write read 29 Jan 12 Feb 26 Feb 11 Mar start time [date] 25 Mar 8 Apr 22 Apr MA-HD MA-MA HD-HD HD-MA Figure 11: Long-term measurements of the latency between to intra-cluster nodes Figure 13: Write and read rates from a cluster node to a Lustre storage system. bandwidth [Mbytes/sec] IMB 3.2 PingPong buffer size 1 GB 6. SUMMARY AND CONCLUSIONS We discussed operational and organizational issues of an InfiniBand interconnection between two clusters over a distance of 28 km. We described the setup of hardware and networking components, and the solution of technical and organizational integration problems. The characteristics of the interconnection are a high point-to-point latency of 145 µsec and a bandwidth of 93 MB/sec according to the specification. 29 Jan 12 Feb 26 Feb 11 Mar start time [date] 25 Mar 8 Apr 22 Apr Figure 12: Long-term measurements of the bandwidth between to intra-cluster nodes 5.2 Storage access performance As shown in figure 4, one of the two Lustre storage systems is connected to the InfiniBand switch in Heidelberg and the other to the InfiniBand switch in Mannheim. Both storage systems provide disk space for users either temporarily or for longer term. Therefore, there are potential accesses from all nodes to both storage systems which leads to continuous traffic across the interconnect. To investigate if there is a difference between the access performance of a compute node to the local storage system and to the remote storage system, we did measurements with the IOzone benchmark [11]. The results shown in Fig. 13 are averages from ten tests of writing and reading a 32 GB file with a record size of 4 MB. The labels of the columns indicate first the location of the node and then the location of the storage system. We find that the read and write transfer One of the main statements of this report is that the optical transmission over 28 km and the InfiniBand switches and adaptors are very stable and work reliable for day-to-day use. Interruptions were only caused by general maintenance activities on the dark fibre connection between the two universities. Furthermore, a bandwidth of the order of MB/sec is sufficient to access a Lustre file system across this kind of long-distance interconnection without noticeable performance impact. This allows the administration and operation of distributed clusters and storage elements as a single system with a common user administration and a common PBS. The advantages of such a system are a higher resource utilization, a better load balance and lower administration costs. For the common user administration we presented a concept for a federated authorization system with direct access to the user management systems of separate organizations. This task involves some challenges, since it requires interorganizational policies and commitments. But the gains from this effort are a further reduction of administration costs and a low access barrier for potential users, who do not need to apply for an extra cluster account, but can use

8 their university account to access the system. The combined system has one shortcoming. The characteristics of the interconnection are not sufficient for all kinds of MPI jobs spanning nodes on two sides of the cluster. Communication-intensive jobs across the cluster would be very slow since the interconnection is a shared medium, where all processes use a single line for the whole communication. The situation could be improved by adding more parallel fibre lines which would be very expensive. A more flexible configuration of the job scheduler remains an issue of further investigations. 7. ACKNOWLEDGMENTS We thank our colleagues Rolf Bogus and Hermann Lauer as well as the bwgrid team for the help in the construction and operation of the interesting hardware and the optimization of the connection, which is the basis for this report. Thanks also to Erich Strohmaier for comments and suggestions for this work. bwgrid is a member of the German D-Grid initiative and is funded by the Ministry for Education and Research (Bundesministerium für Bildung und Forschung), the Ministry for Science, Research and Arts Baden-Württemberg (Ministerium für Wissenschaft, Forschung und Kunst Baden-Württemberg), and the Universities of Baden-Württemberg. 8. REFERENCES [1] S. Carter, M. Minich, and N. Rao. Experimental evaluation of InfiniBand transport over local- and wide-area networks. In Proceedings of the Multiconference SpringSim 27, pages , 27. [2] Cluster Resources Inc. Moab Grid Suite and Maui Cluster scheduler. [accessed May 21], 21. [3] DFN AAI. DFN-Verein authentication and authorization infrastructure (AAI). [accessed May 21], 21. [4] Diogenes Team. DIstributed Optimal GENEtic algorithm for grid applications Scheduling. diogenes.grid. pub.ro/ [accessed May 21], 21. [5] Distributed ASCI Supercomputer. Six-cluster wide-area distributed system for Grid research in the Netherlands. and [accessed Apr 21], 21. [6] eduroam. Secure, world-wide roaming access service developed for the international research and education community. [accessed June 211], 211. [7] E. Gelenbe. Multiprocessor Performance. Wiley, [8] Grid 5. Scientific instrument for the study of large scale parallel and distributed systems in France. [accessed Apr 21], 21. [9] HLRS. MPI benchmark of InfiniBand over fiber optics. [accessed May 21], 28. [1] G. V. Iordache, M. S. Boboila, F. Pop, C. Stratan, and V. Cristea. A decentralized strategy for genetic scheduling heterogeneous environments. J. Multiagent and Grid Systems, pages , 27. [11] IOzone. Filesystem Benchmark. [accessed June 211], 211. [12] H. Kredel, H.-G. Kruse, and S. Richling. Zur Leistung von verteilten, homogenen Clustern. PKI, 2: , 21. [13] H.-G. Kruse. Leistungsbewertung bei Computer-Systemen. Springer Verlag, Heidelberg, 29. [14] LinPack and HPL. Linear Algebra Package and High Performance Linpack. benchmark/ hpl/ [accessed Nov 29], 29. [15] M. Merz and M. Krietemeyer. IPACS integrated performance analysis of computer systems - benchmarks for distributed computer systems. Logos Verlag, Berlin, 26. [16] Moonshot Project. Development of a single unifying technology for extending the benefits of federated identity to a broad range of non-web services. [accessed June 211], 211. [17] Obsidian. High performance network. [accessed May 21], 21. [18] X. Ouyng, H. Subramoni, M. Arnold, and D. K. Panda. Filesystem Performance Evaluation of Obsidian Longbow Routers. [accessed June 211], 211. [19] A. Plaat, H. E. Bal, R. F. H. Hofman, and T. Kielmann. Sensitivity of parallel applications to large differences in bandwidth and latency in two-layer interconnect. Future Generation Computer Systems, pages , 21. [2] C. Prescott and C. A. Taylor. Comparative Performance Analysis of Obsidian Longbow InfiniBand Range-Extension Technology. prescott taylor.pdf [accessed May 21], 21. [21] S. Richling, S. Hau, H. Kredel, and H.-G. Kruse. Operating two InfiniBand grid clusters over 28 km distance. Int. J. Grid and Utility Computing, 2(4):33 312, 211. [22] U. Schlegel, K. Grobe, and D. Southwell. Gb/s DWDM InfiniBand Transport over up to 4 km. file id=1 [accessed May 21], 21. [23] Shibboleth. Internet 2 security architecture for single sign-on across or within organizational boundaries. shibboleth.internet2.edu/ [accessed May 21], 21. [24] S. C. Simms, G. G. Pike, and D. Balog. Wide Area Filesystem Performance using Lustre on TeraGrid. In TeraGrid Conference, 27. [25] W. Yu, N. Rao, and J. S. Vetter. Experimental analysis of infiniband transport services on wan. In Proceedings of the International Conference on Networking, Architecture, and Storage, pages , 28.

Operating two InfiniBand grid clusters over 28 km distance

Operating two InfiniBand grid clusters over 28 km distance Operating two InfiniBand grid clusters over 28 km distance Sabine Richling, Steffen Hau, Heinz Kredel, Hans-Günther Kruse IT-Center University of Heidelberg, Germany IT-Center University of Mannheim, Germany

More information

A Long-distance InfiniBand Interconnection between two Clusters in Production Use

A Long-distance InfiniBand Interconnection between two Clusters in Production Use A Long-distance InfiniBand Interconnection between two Clusters in Production Use Sabine Richling, Steffen Hau, Heinz Kredel, Hans-Günther Kruse IT-Center, University of Heidelberg, Germany IT-Center,

More information

Performance Analysis and Prediction for distributed homogeneous Clusters

Performance Analysis and Prediction for distributed homogeneous Clusters Performance Analysis and Prediction for distributed homogeneous Clusters Heinz Kredel, Hans-Günther Kruse, Sabine Richling, Erich Strohmaier IT-Center, University of Mannheim, Germany IT-Center, University

More information

bwgrid Treff am URZ Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 29.

bwgrid Treff am URZ Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 29. bwgrid Treff am URZ Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 29. April 2010 Richling/Kredel (URZ/RUM) bwgrid Treff SS 2010 1 / 49 Course Organization

More information

Lessons learned from Lustre file system operation

Lessons learned from Lustre file system operation Lessons learned from Lustre file system operation Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association

More information

Hands-On Workshop bwunicluster June 29th 2015

Hands-On Workshop bwunicluster June 29th 2015 Hands-On Workshop bwunicluster June 29th 2015 Agenda Welcome Introduction to bwhpc and the bwunicluster Modules - Software Environment Management Job Submission and Monitoring Interactive Work and Remote

More information

INFOBrief. Dell-IBRIX Cluster File System Solution. Key Points

INFOBrief. Dell-IBRIX Cluster File System Solution. Key Points INFOBrief Dell-IBRIX Cluster File System Solution High-performance parallel, segmented file system for scale-out clusters, grid computing, and enterprise applications Capable of delivering linear scalability

More information

InfoBrief. Platform ROCKS Enterprise Edition Dell Cluster Software Offering. Key Points

InfoBrief. Platform ROCKS Enterprise Edition Dell Cluster Software Offering. Key Points InfoBrief Platform ROCKS Enterprise Edition Dell Cluster Software Offering Key Points High Performance Computing Clusters (HPCC) offer a cost effective, scalable solution for demanding, compute intensive

More information

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments LCI HPC Revolution 2005 26 April 2005 Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments Matthew Woitaszek matthew.woitaszek@colorado.edu Collaborators Organizations National

More information

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine April 2007 Part No 820-1270-11 Revision 1.1, 4/18/07

More information

HPC Architectures. Types of resource currently in use

HPC Architectures. Types of resource currently in use HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Cluster Network Products

Cluster Network Products Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster

More information

QLogic TrueScale InfiniBand and Teraflop Simulations

QLogic TrueScale InfiniBand and Teraflop Simulations WHITE Paper QLogic TrueScale InfiniBand and Teraflop Simulations For ANSYS Mechanical v12 High Performance Interconnect for ANSYS Computer Aided Engineering Solutions Executive Summary Today s challenging

More information

The Optimal CPU and Interconnect for an HPC Cluster

The Optimal CPU and Interconnect for an HPC Cluster 5. LS-DYNA Anwenderforum, Ulm 2006 Cluster / High Performance Computing I The Optimal CPU and Interconnect for an HPC Cluster Andreas Koch Transtec AG, Tübingen, Deutschland F - I - 15 Cluster / High Performance

More information

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE DELL EMC ISILON F800 AND H600 I/O PERFORMANCE ABSTRACT This white paper provides F800 and H600 performance data. It is intended for performance-minded administrators of large compute clusters that access

More information

Reduces latency and buffer overhead. Messaging occurs at a speed close to the processors being directly connected. Less error detection

Reduces latency and buffer overhead. Messaging occurs at a speed close to the processors being directly connected. Less error detection Switching Operational modes: Store-and-forward: Each switch receives an entire packet before it forwards it onto the next switch - useful in a general purpose network (I.e. a LAN). usually, there is a

More information

Now SAML takes it all:

Now SAML takes it all: Now SAML takes it all: Federation of non Web-based Services in the State of Baden-Württemberg Sebastian Labitzke Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) labitzke@kit.edu

More information

VREDPro HPC Raytracing Cluster

VREDPro HPC Raytracing Cluster 1 HPC Raytracing Cluster... 1 1.1 Introduction... 1 1.2 Configuration... 2 1.2.1 Cluster Options... 4 1.2.2 Network Options... 5 1.2.3 Render Node Options... 6 1.2.4 Preferences... 6 1.2.5 Starting the

More information

OBTAINING AN ACCOUNT:

OBTAINING AN ACCOUNT: HPC Usage Policies The IIA High Performance Computing (HPC) System is managed by the Computer Management Committee. The User Policies here were developed by the Committee. The user policies below aim to

More information

bwsync&share: A cloud solution for academia in the state of Baden-Württemberg

bwsync&share: A cloud solution for academia in the state of Baden-Württemberg bwsync&share: A cloud solution for academia in the state of Baden-Württemberg Nico Schlitter, Alexander Yasnogor Steinbuch Centre for Computing Karlsruhe Institute of Technology 76128 Karlsruhe Nico.Schlitter@kit.edu

More information

Managing CAE Simulation Workloads in Cluster Environments

Managing CAE Simulation Workloads in Cluster Environments Managing CAE Simulation Workloads in Cluster Environments Michael Humphrey V.P. Enterprise Computing Altair Engineering humphrey@altair.com June 2003 Copyright 2003 Altair Engineering, Inc. All rights

More information

A Case for High Performance Computing with Virtual Machines

A Case for High Performance Computing with Virtual Machines A Case for High Performance Computing with Virtual Machines Wei Huang*, Jiuxing Liu +, Bulent Abali +, and Dhabaleswar K. Panda* *The Ohio State University +IBM T. J. Waston Research Center Presentation

More information

Storage System. David Southwell, Ph.D President & CEO Obsidian Strategics Inc. BB:(+1)

Storage System. David Southwell, Ph.D President & CEO Obsidian Strategics Inc. BB:(+1) Storage InfiniBand Area Networks System David Southwell, Ph.D President & CEO Obsidian Strategics Inc. BB:(+1) 780.964.3283 dsouthwell@obsidianstrategics.com Agenda System Area Networks and Storage Pertinent

More information

Outline. March 5, 2012 CIRMMT - McGill University 2

Outline. March 5, 2012 CIRMMT - McGill University 2 Outline CLUMEQ, Calcul Quebec and Compute Canada Research Support Objectives and Focal Points CLUMEQ Site at McGill ETS Key Specifications and Status CLUMEQ HPC Support Staff at McGill Getting Started

More information

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Torben Kling-Petersen, PhD Presenter s Name Principle Field Title andengineer Division HPC &Cloud LoB SunComputing Microsystems

More information

Chapter 11: Implementing File

Chapter 11: Implementing File Chapter 11: Implementing File Systems Chapter 11: Implementing File Systems File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency

More information

Optimizing Cluster Utilisation with Bright Cluster Manager

Optimizing Cluster Utilisation with Bright Cluster Manager Optimizing Cluster Utilisation with Bright Cluster Manager Arno Ziebart Sales Manager Germany HPC Advisory Council 2011 www.clustervision.com 1 About us Specialists in Compute, Storage & GPU Clusters (Tailor-Made,

More information

ACCRE High Performance Compute Cluster

ACCRE High Performance Compute Cluster 6 중 1 2010-05-16 오후 1:44 Enabling Researcher-Driven Innovation and Exploration Mission / Services Research Publications User Support Education / Outreach A - Z Index Our Mission History Governance Services

More information

Chapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition

Chapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition Chapter 11: Implementing File Systems Operating System Concepts 9 9h Edition Silberschatz, Galvin and Gagne 2013 Chapter 11: Implementing File Systems File-System Structure File-System Implementation Directory

More information

Beyond Petascale. Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center

Beyond Petascale. Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center Beyond Petascale Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center GPFS Research and Development! GPFS product originated at IBM Almaden Research Laboratory! Research continues to

More information

Linux Clusters for High- Performance Computing: An Introduction

Linux Clusters for High- Performance Computing: An Introduction Linux Clusters for High- Performance Computing: An Introduction Jim Phillips, Tim Skirvin Outline Why and why not clusters? Consider your Users Application Budget Environment Hardware System Software HPC

More information

Extraordinary HPC file system solutions at KIT

Extraordinary HPC file system solutions at KIT Extraordinary HPC file system solutions at KIT Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State Roland of Baden-Württemberg Laifer Lustre and tools for ldiskfs investigation

More information

Exam : S Title : Snia Storage Network Management/Administration. Version : Demo

Exam : S Title : Snia Storage Network Management/Administration. Version : Demo Exam : S10-200 Title : Snia Storage Network Management/Administration Version : Demo 1. A SAN architect is asked to implement an infrastructure for a production and a test environment using Fibre Channel

More information

MOHA: Many-Task Computing Framework on Hadoop

MOHA: Many-Task Computing Framework on Hadoop Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction

More information

Day 9: Introduction to CHTC

Day 9: Introduction to CHTC Day 9: Introduction to CHTC Suggested reading: Condor 7.7 Manual: http://www.cs.wisc.edu/condor/manual/v7.7/ Chapter 1: Overview Chapter 2: Users Manual (at most, 2.1 2.7) 1 Turn In Homework 2 Homework

More information

PROFITBRICKS IAAS VIRTUAL DATA CENTER. An Introduction to ProfitBricks VDC

PROFITBRICKS IAAS VIRTUAL DATA CENTER. An Introduction to ProfitBricks VDC PROFITBRICKS IAAS VIRTUAL DATA CENTER An Introduction to ProfitBricks VDC Why Cloud Computing? In the future, it will be difficult for IT service providers to avoid the subject of cloud computing. Still

More information

Assistance in Lustre administration

Assistance in Lustre administration Assistance in Lustre administration Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association www.kit.edu

More information

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez Scientific data processing at global scale The LHC Computing Grid Chengdu (China), July 5th 2011 Who I am 2 Computing science background Working in the field of computing for high-energy physics since

More information

Who says world-class high performance computing (HPC) should be reserved for large research centers? The Cray CX1 supercomputer makes HPC performance

Who says world-class high performance computing (HPC) should be reserved for large research centers? The Cray CX1 supercomputer makes HPC performance Who says world-class high performance computing (HPC) should be reserved for large research centers? The Cray CX1 supercomputer makes HPC performance available to everyone, combining the power of a high

More information

A User-level Secure Grid File System

A User-level Secure Grid File System A User-level Secure Grid File System Ming Zhao, Renato J. Figueiredo Advanced Computing and Information Systems (ACIS) Electrical and Computer Engineering University of Florida {ming, renato}@acis.ufl.edu

More information

The MOSIX Scalable Cluster Computing for Linux. mosix.org

The MOSIX Scalable Cluster Computing for Linux.  mosix.org The MOSIX Scalable Cluster Computing for Linux Prof. Amnon Barak Computer Science Hebrew University http://www. mosix.org 1 Presentation overview Part I : Why computing clusters (slide 3-7) Part II : What

More information

Our new HPC-Cluster An overview

Our new HPC-Cluster An overview Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization

More information

Habanero Operating Committee. January

Habanero Operating Committee. January Habanero Operating Committee January 25 2017 Habanero Overview 1. Execute Nodes 2. Head Nodes 3. Storage 4. Network Execute Nodes Type Quantity Standard 176 High Memory 32 GPU* 14 Total 222 Execute Nodes

More information

To Infiniband or Not Infiniband, One Site s s Perspective. Steve Woods MCNC

To Infiniband or Not Infiniband, One Site s s Perspective. Steve Woods MCNC To Infiniband or Not Infiniband, One Site s s Perspective Steve Woods MCNC 1 Agenda Infiniband background Current configuration Base Performance Application performance experience Future Conclusions 2

More information

Virtualization of the ATLAS Tier-2/3 environment on the HPC cluster NEMO

Virtualization of the ATLAS Tier-2/3 environment on the HPC cluster NEMO Virtualization of the ATLAS Tier-2/3 environment on the HPC cluster NEMO Ulrike Schnoor (CERN) Anton Gamel, Felix Bührer, Benjamin Rottler, Markus Schumacher (University of Freiburg) February 02, 2018

More information

Designing a Cluster for a Small Research Group

Designing a Cluster for a Small Research Group Designing a Cluster for a Small Research Group Jim Phillips, John Stone, Tim Skirvin Low-cost Linux Clusters for Biomolecular Simulations Using NAMD Outline Why and why not clusters? Consider your Users

More information

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

Advanced School in High Performance and GRID Computing November Introduction to Grid computing. 1967-14 Advanced School in High Performance and GRID Computing 3-14 November 2008 Introduction to Grid computing. TAFFONI Giuliano Osservatorio Astronomico di Trieste/INAF Via G.B. Tiepolo 11 34131 Trieste

More information

Cloud Computing. UCD IT Services Experience

Cloud Computing. UCD IT Services Experience Cloud Computing UCD IT Services Experience Background - UCD IT Services Central IT provider for University College Dublin 23,000 Full Time Students 7,000 Researchers 5,000 Staff Background - UCD IT Services

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

Datura The new HPC-Plant at Albert Einstein Institute

Datura The new HPC-Plant at Albert Einstein Institute Datura The new HPC-Plant at Albert Einstein Institute Nico Budewitz Max Planck Institue for Gravitational Physics, Germany Cluster Day, 2011 Outline 1 History HPC-Plants at AEI -2009 Peyote, Lagavulin,

More information

bwfdm Communities - a Research Data Management Initiative in the State of Baden-Wuerttemberg

bwfdm Communities - a Research Data Management Initiative in the State of Baden-Wuerttemberg bwfdm Communities - a Research Data Management Initiative in the State of Baden-Wuerttemberg Karlheinz Pappenberger Tromsø, 9th Munin Conference on Scholarly Publishing, 27/11/2014 Overview 1) Federalism

More information

XSEDE New User Training. Ritu Arora November 14, 2014

XSEDE New User Training. Ritu Arora   November 14, 2014 XSEDE New User Training Ritu Arora Email: rauta@tacc.utexas.edu November 14, 2014 1 Objectives Provide a brief overview of XSEDE Computational, Visualization and Storage Resources Extended Collaborative

More information

Distributed ASCI Supercomputer DAS-1 DAS-2 DAS-3 DAS-4 DAS-5

Distributed ASCI Supercomputer DAS-1 DAS-2 DAS-3 DAS-4 DAS-5 Distributed ASCI Supercomputer DAS-1 DAS-2 DAS-3 DAS-4 DAS-5 Paper IEEE Computer (May 2016) What is DAS? Distributed common infrastructure for Dutch Computer Science Distributed: multiple (4-6) clusters

More information

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance This Dell EMC technical white paper discusses performance benchmarking results and analysis for Simulia

More information

An Introduction to GPFS

An Introduction to GPFS IBM High Performance Computing July 2006 An Introduction to GPFS gpfsintro072506.doc Page 2 Contents Overview 2 What is GPFS? 3 The file system 3 Application interfaces 4 Performance and scalability 4

More information

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of Computer Science & Engineering Chentao Wu wuct@cs.sjtu.edu.cn Download lectures ftp://public.sjtu.edu.cn User:

More information

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision

More information

Module 2 Storage Network Architecture

Module 2 Storage Network Architecture Module 2 Storage Network Architecture 1. SCSI 2. FC Protocol Stack 3. SAN:FC SAN 4. IP Storage 5. Infiniband and Virtual Interfaces FIBRE CHANNEL SAN 1. First consider the three FC topologies pointto-point,

More information

OPERATING SYSTEM. Chapter 12: File System Implementation

OPERATING SYSTEM. Chapter 12: File System Implementation OPERATING SYSTEM Chapter 12: File System Implementation Chapter 12: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management

More information

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Hari Subramoni, Ping Lai, Sayantan Sur and Dhabhaleswar. K. Panda Department of

More information

ANALYSIS OF CLUSTER INTERCONNECTION NETWORK TOPOLOGIES

ANALYSIS OF CLUSTER INTERCONNECTION NETWORK TOPOLOGIES ANALYSIS OF CLUSTER INTERCONNECTION NETWORK TOPOLOGIES Sergio N. Zapata, David H. Williams and Patricia A. Nava Department of Electrical and Computer Engineering The University of Texas at El Paso El Paso,

More information

Patagonia Cluster Project Research Cluster

Patagonia Cluster Project Research Cluster Patagonia Cluster Project Research Cluster Clusters of PCs Multi-Boot and Multi-Purpose? Christian Kurmann, Felix Rauch, Michela Taufer, Prof. Thomas M. Stricker Laboratory for Computer Systems ETHZ -

More information

SAP High-Performance Analytic Appliance on the Cisco Unified Computing System

SAP High-Performance Analytic Appliance on the Cisco Unified Computing System Solution Overview SAP High-Performance Analytic Appliance on the Cisco Unified Computing System What You Will Learn The SAP High-Performance Analytic Appliance (HANA) is a new non-intrusive hardware and

More information

Data storage services at KEK/CRC -- status and plan

Data storage services at KEK/CRC -- status and plan Data storage services at KEK/CRC -- status and plan KEK/CRC Hiroyuki Matsunaga Most of the slides are prepared by Koichi Murakami and Go Iwai KEKCC System Overview KEKCC (Central Computing System) The

More information

Installing and Configuring VMware Identity Manager Connector (Windows) OCT 2018 VMware Identity Manager VMware Identity Manager 3.

Installing and Configuring VMware Identity Manager Connector (Windows) OCT 2018 VMware Identity Manager VMware Identity Manager 3. Installing and Configuring VMware Identity Manager Connector 2018.8.1.0 (Windows) OCT 2018 VMware Identity Manager VMware Identity Manager 3.3 You can find the most up-to-date technical documentation on

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

Functional Requirements for Grid Oriented Optical Networks

Functional Requirements for Grid Oriented Optical Networks Functional Requirements for Grid Oriented Optical s Luca Valcarenghi Internal Workshop 4 on Photonic s and Technologies Scuola Superiore Sant Anna Pisa June 3-4, 2003 1 Motivations Grid networking connection

More information

Bluemin: A Suite for Management of PC Clusters

Bluemin: A Suite for Management of PC Clusters Bluemin: A Suite for Management of PC Clusters Hai Jin, Hao Zhang, Qincheng Zhang, Baoli Chen, Weizhong Qiang School of Computer Science and Engineering Huazhong University of Science and Technology Wuhan,

More information

Designing SAN Using Cisco MDS 9000 Series Fabric Switches

Designing SAN Using Cisco MDS 9000 Series Fabric Switches White Paper Designing SAN Using Cisco MDS 9000 Series Fabric Switches September 2016 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 1 of 15 Contents What You

More information

SCS Distributed File System Service Proposal

SCS Distributed File System Service Proposal SCS Distributed File System Service Proposal Project Charter: To cost effectively build a Distributed networked File Service (DFS) that can grow to Petabyte scale, customized to the size and performance

More information

Integration of Cloud and Grid Middleware at DGRZR

Integration of Cloud and Grid Middleware at DGRZR D- of International Symposium on Computing 2010 Stefan Freitag Robotics Research Institute Dortmund University of Technology March 12, 2010 Overview D- 1 D- Resource Center Ruhr 2 Clouds in the German

More information

Brutus. Above and beyond Hreidar and Gonzales

Brutus. Above and beyond Hreidar and Gonzales Brutus Above and beyond Hreidar and Gonzales Dr. Olivier Byrde Head of HPC Group, IT Services, ETH Zurich Teodoro Brasacchio HPC Group, IT Services, ETH Zurich 1 Outline High-performance computing at ETH

More information

The Red Storm System: Architecture, System Update and Performance Analysis

The Red Storm System: Architecture, System Update and Performance Analysis The Red Storm System: Architecture, System Update and Performance Analysis Douglas Doerfler, Jim Tomkins Sandia National Laboratories Center for Computation, Computers, Information and Mathematics LACSI

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

NUSGRID a computational grid at NUS

NUSGRID a computational grid at NUS NUSGRID a computational grid at NUS Grace Foo (SVU/Academic Computing, Computer Centre) SVU is leading an initiative to set up a campus wide computational grid prototype at NUS. The initiative arose out

More information

MVAPICH MPI and Open MPI

MVAPICH MPI and Open MPI CHAPTER 6 The following sections appear in this chapter: Introduction, page 6-1 Initial Setup, page 6-2 Configure SSH, page 6-2 Edit Environment Variables, page 6-5 Perform MPI Bandwidth Test, page 6-8

More information

Parallel File Systems Compared

Parallel File Systems Compared Parallel File Systems Compared Computing Centre (SSCK) University of Karlsruhe, Germany Laifer@rz.uni-karlsruhe.de page 1 Outline» Parallel file systems (PFS) Design and typical usage Important features

More information

vsphere Installation and Setup Update 2 Modified on 10 JULY 2018 VMware vsphere 6.5 VMware ESXi 6.5 vcenter Server 6.5

vsphere Installation and Setup Update 2 Modified on 10 JULY 2018 VMware vsphere 6.5 VMware ESXi 6.5 vcenter Server 6.5 vsphere Installation and Setup Update 2 Modified on 10 JULY 2018 VMware vsphere 6.5 VMware ESXi 6.5 vcenter Server 6.5 You can find the most up-to-date technical documentation on the VMware website at:

More information

New trends in Identity Management

New trends in Identity Management New trends in Identity Management Peter Gietz, DAASI International GmbH peter.gietz@daasi.de Track on Research and Education Networking in South East Europe, Yu Info 2007, Kopaionik, Serbia 14 March 2007

More information

Pushing the Limits. ADSM Symposium Sheelagh Treweek September 1999 Oxford University Computing Services 1

Pushing the Limits. ADSM Symposium Sheelagh Treweek September 1999 Oxford University Computing Services 1 Pushing the Limits ADSM Symposium Sheelagh Treweek sheelagh.treweek@oucs.ox.ac.uk September 1999 Oxford University Computing Services 1 Overview History of ADSM services at Oxford October 1995 - started

More information

Experiences with the Parallel Virtual File System (PVFS) in Linux Clusters

Experiences with the Parallel Virtual File System (PVFS) in Linux Clusters Experiences with the Parallel Virtual File System (PVFS) in Linux Clusters Kent Milfeld, Avijit Purkayastha, Chona Guiang Texas Advanced Computing Center The University of Texas Austin, Texas USA Abstract

More information

Introduction to Cluster Computing

Introduction to Cluster Computing Introduction to Cluster Computing Prabhaker Mateti Wright State University Dayton, Ohio, USA Overview High performance computing High throughput computing NOW, HPC, and HTC Parallel algorithms Software

More information

Pass-Through Technology

Pass-Through Technology CHAPTER 3 This chapter provides best design practices for deploying blade servers using pass-through technology within the Cisco Data Center Networking Architecture, describes blade server architecture,

More information

The Use of Cloud Computing Resources in an HPC Environment

The Use of Cloud Computing Resources in an HPC Environment The Use of Cloud Computing Resources in an HPC Environment Bill, Labate, UCLA Office of Information Technology Prakashan Korambath, UCLA Institute for Digital Research & Education Cloud computing becomes

More information

Cloud Computing For Researchers

Cloud Computing For Researchers Cloud Computing For Researchers August, 2016 Compute Canada is often asked about the potential of outsourcing to commercial clouds. This has been investigated as an alternative or supplement to purchasing

More information

iscsi Technology: A Convergence of Networking and Storage

iscsi Technology: A Convergence of Networking and Storage HP Industry Standard Servers April 2003 iscsi Technology: A Convergence of Networking and Storage technology brief TC030402TB Table of Contents Abstract... 2 Introduction... 2 The Changing Storage Environment...

More information

Boundary control : Access Controls: An access control mechanism processes users request for resources in three steps: Identification:

Boundary control : Access Controls: An access control mechanism processes users request for resources in three steps: Identification: Application control : Boundary control : Access Controls: These controls restrict use of computer system resources to authorized users, limit the actions authorized users can taker with these resources,

More information

bwfortreff bwhpc user meeting

bwfortreff bwhpc user meeting bwfortreff bwhpc user meeting bwhpc Competence Center MLS&WISO Universitätsrechenzentrum Heidelberg Rechenzentrum der Universität Mannheim Steinbuch Centre for Computing (SCC) Funding: www.bwhpc-c5.de

More information

Chapter 11: Implementing File Systems

Chapter 11: Implementing File Systems Chapter 11: Implementing File Systems Operating System Concepts 99h Edition DM510-14 Chapter 11: Implementing File Systems File-System Structure File-System Implementation Directory Implementation Allocation

More information

MAHA. - Supercomputing System for Bioinformatics

MAHA. - Supercomputing System for Bioinformatics MAHA - Supercomputing System for Bioinformatics - 2013.01.29 Outline 1. MAHA HW 2. MAHA SW 3. MAHA Storage System 2 ETRI HPC R&D Area - Overview Research area Computing HW MAHA System HW - Rpeak : 0.3

More information

NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC

NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Segregated storage and compute NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Co-located storage and compute HDFS, GFS Data

More information

Readme for Platform Open Cluster Stack (OCS)

Readme for Platform Open Cluster Stack (OCS) Readme for Platform Open Cluster Stack (OCS) Version 4.1.1-2.0 October 25 2006 Platform Computing Contents What is Platform OCS? What's New in Platform OCS 4.1.1-2.0? Supported Architecture Distribution

More information

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage Evaluation of Lustre File System software enhancements for improved Metadata performance Wojciech Turek, Paul Calleja,John

More information

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC

More information

Evaluating the Impact of RDMA on Storage I/O over InfiniBand

Evaluating the Impact of RDMA on Storage I/O over InfiniBand Evaluating the Impact of RDMA on Storage I/O over InfiniBand J Liu, DK Panda and M Banikazemi Computer and Information Science IBM T J Watson Research Center The Ohio State University Presentation Outline

More information

Lenovo ThinkSystem NE Release Notes. For Lenovo Cloud Network Operating System 10.6

Lenovo ThinkSystem NE Release Notes. For Lenovo Cloud Network Operating System 10.6 Lenovo ThinkSystem NE10032 Release Notes For Lenovo Cloud Network Operating System 10.6 Note: Before using this information and the product it supports, read the general information in the Safety information

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx

More information

Chapter 12: File System Implementation

Chapter 12: File System Implementation Chapter 12: File System Implementation Chapter 12: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency

More information

Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances

Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances Technology Brief Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances The world

More information

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Qi Gao, Weikuan Yu, Wei Huang, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of Computer Science & Engineering

More information