ANALYSIS OF CLUSTER INTERCONNECTION NETWORK TOPOLOGIES
|
|
- Iris Blair
- 5 years ago
- Views:
Transcription
1 ANALYSIS OF CLUSTER INTERCONNECTION NETWORK TOPOLOGIES Sergio N. Zapata, David H. Williams and Patricia A. Nava Department of Electrical and Computer Engineering The University of Texas at El Paso El Paso, TX Abstract Cluster computing provides an economical alternative for high performance computing, that in the past could only be provided by expensive parallel supercomputers. Clusters are built with standard components and interconnected by various interconnection topologies. These interconnection topologies provide different approaches for communication between processing nodes within the cluster. A study has been performed to evaluate the computing and network speeds of a cluster of nine computers consisting of a front-end and eight compute nodes interconnected in star, channel bonding, and flat neighborhood network topologies. For this task two applications were developed; one performs a data transfer test between two nodes and measures the round trip time of the transfer; the second application performs distributed matrix multiplication using all of the compute nodes. In addition the High Performance Linpack (HPL) benchmark was utilized. These applications were applied with the cluster network configured using the three aforementioned topologies. Results show that 2-way channel bonding is the best alternative providing a peak performance of 12 GFLOPs. The Flat Neighborhood Network proved to be effective, but at a higher cost since at least one extra switch and one extra NIC for each processing node was required. Keywords: Cluster Computing, Interconnection Networks 1.0 Introduction Cluster 1 computing has become an economical solution to satisfy the need for super computing power. This approach is basically achieved by clustering more than two computers built from mass-market commodity off-the-shelf (M 2 COTS) components. 1 This work was supported in part by an equipment grant from Cisco Systems, and by the National Science Foundation under grant #EIA Commodity components make clustering extremely cost effective, with an excellent priceperformance curve [1]. In addition, with the development of free software, operating systems, and tools, this design method has become even more appealing. The objective of this work was to build a small Beowulf cluster, and to investigate its performance based on three different interconnection network topologies [2]. In order to test the overall performance of the cluster, a well-known test bench was utilized: High Performance Linpack (HPL). HPL performs a series of matrix operations that evaluate the computing power of the computing nodes, and distributes the processing in different ways measuring the ability of the network topology. In addition two other applications were developed in order to test the network performance. These include a data transfer rate test between two CPUs to measure the raw communication speed, and matrix multiplication distributed to multiple CPUs. 2.0 Test Methods 2.1 Hardware Configuration Each node within the system was constructed using a Gigabyte GA-7VKMP motherboard, with one AMD Athlon XP CPU, 512 MB of memory, CD-ROM and floppy drives, and a 60 GB disk drive. On the compute nodes, the disk is employed for system software, and on the front-end node, user files are also stored. Two 10/100 Dlink network cards were added to each node, in addition to the 10/100 NIC contained on the motherboard. Each computer, then, included a total of three 100
2 Mb/Sec Ethernet NICs, to provide multiple communication channels for the different network topologies. Three topologies were tested using star, channel bonding, and flat network configurations. The star topology is the most common configuration employed in a local area network. Its most basic form consists of a network switch at the center, and all computers directly linked to it forming a shape resembling a star as shown in Figure 1. done through the MAC addresses of the packets, and thereby preventing collisions from traffic from different nodes. Figure 2. Channel Bonding Topology The Flat Neighborhood Network (FNN) was developed at the University of Kentucky [3] and has the main characteristic of low latency. In typical scenarios, when trying to interconnect large computer clusters (more than 1000 nodes), switch fabrics are employed which in addition to being very expensive, add latency to the network. A FNN achieves low latency by having only one switch connecting any two nodes in the cluster [Agg04]. Figure 3 shows the FNN configuration employed for this study. Figure 1. Star Topology Channel bonding consists of logically striping (enslaving) N network interfaces, where N > 2, making them work as one. This can theoretically increase the bandwidth by a factor of N. In addition to fulfill bandwidth requirements, channel bonding can also be used for high availability by means of redundancy, or to service multiple network segments. In data bandwidth channel bonding the data packets are sent to the output devices (NICs) in round robin order. This trunk shares the Medium Access Control (MAC) address from the first enslaved device between all the NICs members of the trunk. Thus, regardless from which interface a request was received, the response is always sent from the next available NIC. Each NIC must be connected to a different switch as shown in Figure 2. The switches perform layer two routing, which is Figure 3. Flat Neighborhood Network As can be seen from Figure 3, the interconnection network is consisted of three different subnets that allow each machine to communicate with each other through only one intermediate switch. 2.2 Software Configuration In operating a cluster, it is desirable to have software to maintain the homogeneity of the system software on all of the nodes and to provide tools to manage and monitor the operation of the system. For our System, the Rocks Cluster Distribution developed by the National Partnership for Advanced Computational Infrastructure [4] was chosen for this task since it includes all of the necessary
3 tools for administration; it is open software; and is based on the very well known Red Hat Linux Distribution. Currently we employ version 3.6 of ROCKS Performance Evaluation Software Three test suites were employed to test the communications and/or computational performance of the system. These include a data transfer program that transfers large amounts of packets between two nodes and measures the round trip time of the transfer; a distributed matrix multiplication program for overall performance evaluation; and the well known High Performance Linpack (HPL) benchmark to test overall system speed. The data transfer rate test was devised in order to better understand the behavior of the topology being tested and is based on the client server model. The client creates a short integer array with random values, and using sockets transmits the array to the server. The server, receives the data, stores it in an array, and resends, or echoes back the same data to the client. The type of data, and the amount of data to be sent between the client and the server was determined based on the size of an Ethernet frame. Excluding the overhead of the protocols, the theoretical size of data per frame that can be sent is 1460 bytes [1], but with the aid of the Ethereal [5] sniffer, the actual amount of data that actually could be sent was determined to be 1448 bytes, or 724 short elements. The effectiveness of the network is determined by measuring the data round trip time. The amount of data increases in intervals of 724 elements, forcing the transfer of data to an integer number of Ethernet frames; no partial packets are created. The iterations were run from 1 to times in increments of 1 and multiples of 724 short integers, so the minimum data sent is 1448 bytes and the maximum data sent is approximately 14 Mbytes. The distributed matrix multiplication program was used to measure the correlation between processing power and network speed to determine the cluster performance and consists of a client and a server program. The client program reads two random valued, large matrices, up to 48 million double elements in size, into memory. Then, the client proceeds to fork up to 8 children processes, one per server. Each of the children uses the sockets protocol to send one matrix and 1/N of the other matrix to its corresponding server, where N specifies the number of servers. Each server reads the data from the socket, performs the matrix multiplication, and returns the results. At the client side, each child, after reading the partial results from the server, writes the results to a shared memory region accessible by the client. When all children have returned, client prints the resulting matrix. This scenario was built to mimic a more realistic operation of data transfer, since the packets produced now are not forced to perfectly fit in an Ethernet frame, and the computational performance of the cluster is included. The HPL [6] program, the standard benchmark to measure cluster speed [7], solves a dense linear system of double precision arithmetic values. To accomplish this task, and to best accommodate the system architecture to be benchmarked, HPL provides various configuration options. These options include where the output of the benchmark is to be written; the number of problem sizes which are directly dependant on the amount of memory; the number of block sizes in which the operation is going to be divided; number of process grids which are multiples of the number of processors available in the system; and the method of factorization, left looking variant, Crout variant, or Right looking variant. Among the most important factors for this work is the functionality of different virtual panel broadcast topologies for data distribution and processor communications, which directly affect the cluster performance depending on the interconnection network topology.
4 3.0 Results and Conclusions 3.1 Data Transfer Rate Test The Data Transfer Rate Test (DTRT) is executed between every two nodes of the cluster, with transfer data sizes from 1,448 to 14,480,000 bytes, in 10,000 iterations. All data sets are multiples of 1,448 bytes to completely fill each Ethernet frame without creating partial frames. The DTRT times were measured using the gettimeofday() system call. The DTRT was only performed on the star, and the 2-way channel bonding network since the results from the star topology can be extrapolated to the FNN case. After the results from the 2-way channel bonding were obtained, further testing was accomplished by means of an additional switch allowing for 3-way channel bonding. The star topology and the Flat Neighborhood Network share the same model at the pair-wise level. Both layouts have only a single direct link connection to a switch between any two nodes. Figure 4 displays the results from the DTRT. theoretical limit of 2.0 that would be expected. The addition of another interface to the channelbonding scheme shows an improvement of 2.9 over the star topology, from which a theoretical improvement of 3.0 was expected. Up to this point, channel bonding seems to be a cost effective alternative. 3.2 Distributed Matrix Multiplication The Distributed Matrix Multiplication Test (DMMT) gives an overall ideal performance of the network because the data sent between the machines is forced to avoid partial packet creation, and the results do not include computing factors. The DMMT involves multiplying two very large matrices whose size is dependent on the number of servers. The matrix dimensions are Nx and xN, where N = {2,4,6,8}, the number of servers. Figure 5 illustrates the timing results of the DMMT for each of the interconnection network topology types. Figure 5. Timing Results for the DMMT Figure 4. Data Transfer Rate Test Results The results show the cost-effectiveness of the channel-bonded networks become more apparent as the size of the data increases. Small data sets show no improvement of channelbonded networks over the traditional star topology or the FNN, however, as the size increases, the 2-way channel-bonded network presents an improvement of 1.9, close to the Figures 4 and 5 illustrate that 2-way channel bonding gives the best overall performance since it scales linearly such that its performance increase over the star topology, is close to 2. On the other hand, 3-way channel bonding gives the most performance increase, but does not scale linearly. The maximum performance increase for 3-way channel bonding was reached when 4 servers were active for a performance increase factor of 2.8. For the
5 most intensive cases using 6 and 8 servers, the performance increase is only around 2.4 times faster than from the star topology. These results help to conclude that having extra network interface, in a system performing actual computations may not be cost-effective. Many factors can cause this nonideal performance from the 3-way channel bonding. One of the reasons may be the nonsymmetric networking approach utilized and the inability of the driver to properly handle an odd number of network interface cards. Another reason may be the high amount of partial packets created and the collision overhead caused by these packets directly affect the performance of the network. Lastly is the possibility that the network queues may be saved out of order and therefore be ineffectually used by the bonding driver. The FNN was tested in two different ways, one forcing all of the data distribution and computation possible through one switch, and the alternative approach using two switches for the communication. The performance of the first approach is close to that of a single channel star configuration, since only one switch is available for data distribution. The alternate approach shows a performance close to 2-way channel bonding because data are distributed using all the possible bandwidth that two switches can provide. 3.3 High Performance Linpack HPL was executed with various problem sizes, process grids, block sizes, and with different virtual broadcasting topologies. The problem sizes utilized were 2,000; 5,000; 10,000; 15,000; and 20,000 elements, except in a special circumstance in which a problem size of 18,000 elements was employed. Since the process grids are dependent of the number of processors are employed and noting that up to 9 single processor nodes (1 front-end and 8 compute nodes) are available, process grids of 1x8, 2x4, and 3x3 (the last one using the front-end node as well) were utilized. The data distribution takes place by the virtual broadcasting topologies. There exist six different topologies, which make full deployment of the network by distributing the data in a ring, and by employing different pairwise distribution schemes [6]. Although all of the broadcasting topologies were tested, the results presented below are only from the best broadcasting topology for this application. Figure 6 show the performance of the system when configured using the three network topologies. The figure illustrates that 2-way channel bonding to be the best by a factor around 2 GFLOPS over the other two topologies. (3-way channel bonding was not included in this test because, as previously stated, it did not scale well.) From the figure it can also be concluded that having a basic FNN is not cost-effective compared to the channel bonding since it only provides minor improvements over the single channel, or Star, topology. The cost of setting up the FNN is much more expensive compared to the star topology or even the 2-way channel bonding since it requires three switches, and special routing tables for each node. The performance of 8 processors was compared to the performance of 9 processors by using the front-end as a compute node. The maximum output of in both cases was very close to one another, and in the cases of the other two networks when all nine nodes (3x3) were doing computing, the performance decreased, compared with that of the previous case of 8 (2x4) processors. The most plausible reason is that the network reached its limit or, in other words, network contention dominates. 3.4 Conclusions Test protocols were implemented for each of the three topologies: Data Transfer Rate Test (DTRT), Distributed Matrix Multiplication (DMMT), and High Performance Linpack (HPL). The DTRT showed that in general Channel Bonding is the best option to implement, giving improvement factors of close
6 to the number of NICs attached to each node. On the other hand, the DMM, being a closer to real life application, showed that true cost effectiveness is only reached when using 2-way channel bonding. HPL tested overall system performance peaking 12 GFLOPS with 2-way channel bonding. So, all in all, 2-way channel bonding provided the fastest, least expensive, and most scalable communications. [7] Performance Linpack Benchmark for Distributed-Memory Computers, Innovative Computing Laboratory, University of Tennessee, Computer Science Department ex.html Top 500 Supercomputer Sites. University of Mannheim, University of Tennessee, National Energy Research Scientific Computer Center Figure 6. HPL Performance in FLOPS 2x4 Process Grids and 210 Block Size 4.0 References [1] Sterling, Thomas. Beowulf Cluster Computing with Linux. The MIT Press, 2002 [2] Zapata, Sergio N., Analysis of Cluster Interconnection Network Topologies, M. S. Thesis, The University of Texas At El Paso, July [3] Dietz, Hank. FNN: Flat Neighborhood Network. University of Kentucky, [4] NPACI. Rocks Cluster Distribution Users Guide, 2003, [5] Combs, Gerald. Ethereal: The world most popular network protocol analyzer. 2004, [6] Petitet, A., Whaley, R. C., Dongarra J., and Cleary, A. HPL - A Portable Implementation of the High-
High-End Computing Systems
High-End Computing Systems EE380 State-of-the-Art Lecture Hank Dietz Professor & Hardymon Chair in Networking Electrical & Computer Engineering Dept. University of Kentucky Lexington, KY 40506-0046 http://aggregate.org/hankd/
More informationHigh-End Computing Systems
High-End Computing Systems EE380 State-of-the-Art Lecture Hank Dietz Professor & Hardymon Chair in Networking Electrical & Computer Engineering Dept. University of Kentucky Lexington, KY 40506-0046 http://aggregate.org/hankd/
More informationComparison of Parallel Processing Systems. Motivation
Comparison of Parallel Processing Systems Ash Dean Katie Willis CS 67 George Mason University Motivation Increasingly, corporate and academic projects require more computing power than a typical PC can
More informationLoad Balancing Approach Based on Limitations and Bottlenecks of Multi-core Architectures on a Beowulf Cluster Compute-Node
Load Balancing Approach Based on Limitations and Bottlenecks of Multi-core Architectures on a Beowulf Cluster Compute-Node Damian Valles, David H. Williams, and Patricia A. Nava Electrical and Computer
More informationOn the Performance of Simple Parallel Computer of Four PCs Cluster
On the Performance of Simple Parallel Computer of Four PCs Cluster H. K. Dipojono and H. Zulhaidi High Performance Computing Laboratory Department of Engineering Physics Institute of Technology Bandung
More informationPerformance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer
Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer Andre L.C. Barczak 1, Chris H. Messom 1, and Martin J. Johnson 1 Massey University, Institute of Information and
More informationExpand In-Memory Capacity at a Fraction of the Cost of DRAM: AMD EPYCTM and Ultrastar
White Paper March, 2019 Expand In-Memory Capacity at a Fraction of the Cost of DRAM: AMD EPYCTM and Ultrastar Massive Memory for AMD EPYC-based Servers at a Fraction of the Cost of DRAM The ever-expanding
More informationScaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc
Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC
More informationAffordable and power efficient computing for high energy physics: CPU and FFT benchmarks of ARM processors
Affordable and power efficient computing for high energy physics: CPU and FFT benchmarks of ARM processors Mitchell A Cox, Robert Reed and Bruce Mellado School of Physics, University of the Witwatersrand.
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationChallenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery
Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark by Nkemdirim Dockery High Performance Computing Workloads Core-memory sized Floating point intensive Well-structured
More informationOracle Database 11g Direct NFS Client Oracle Open World - November 2007
Oracle Database 11g Client Oracle Open World - November 2007 Bill Hodak Sr. Product Manager Oracle Corporation Kevin Closson Performance Architect Oracle Corporation Introduction
More informationWhat are Clusters? Why Clusters? - a Short History
What are Clusters? Our definition : A parallel machine built of commodity components and running commodity software Cluster consists of nodes with one or more processors (CPUs), memory that is shared by
More informationPerformance of the AMD Opteron LS21 for IBM BladeCenter
August 26 Performance Analysis Performance of the AMD Opteron LS21 for IBM BladeCenter Douglas M. Pase and Matthew A. Eckl IBM Systems and Technology Group Page 2 Abstract In this paper we examine the
More informationReal Parallel Computers
Real Parallel Computers Modular data centers Overview Short history of parallel machines Cluster computing Blue Gene supercomputer Performance development, top-500 DAS: Distributed supercomputing Short
More informationHigh Performance Supercomputing using Infiniband based Clustered Servers
High Performance Supercomputing using Infiniband based Clustered Servers M.J. Johnson A.L.C. Barczak C.H. Messom Institute of Information and Mathematical Sciences Massey University Auckland, New Zealand.
More informationThe Optimal CPU and Interconnect for an HPC Cluster
5. LS-DYNA Anwenderforum, Ulm 2006 Cluster / High Performance Computing I The Optimal CPU and Interconnect for an HPC Cluster Andreas Koch Transtec AG, Tübingen, Deutschland F - I - 15 Cluster / High Performance
More informationHigh bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK
High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK [r.tasker@dl.ac.uk] DataTAG is a project sponsored by the European Commission - EU Grant IST-2001-32459
More informationBenchmarking CPU Performance
Benchmarking CPU Performance Many benchmarks available MHz (cycle speed of processor) MIPS (million instructions per second) Peak FLOPS Whetstone Stresses unoptimized scalar performance, since it is designed
More informationPresentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories
HPC Benchmarking Presentations: Jack Dongarra, University of Tennessee & ORNL The HPL Benchmark: Past, Present & Future Mike Heroux, Sandia National Laboratories The HPCG Benchmark: Challenges It Presents
More informationHigh Volume Transaction Processing in Enterprise Applications
High Volume Transaction Processing in Enterprise Applications By Thomas Wheeler Recursion Software, Inc. February 16, 2005 TABLE OF CONTENTS Overview... 1 Products, Tools, and Environment... 1 OS and hardware
More informationBenchmarking CPU Performance. Benchmarking CPU Performance
Cluster Computing Benchmarking CPU Performance Many benchmarks available MHz (cycle speed of processor) MIPS (million instructions per second) Peak FLOPS Whetstone Stresses unoptimized scalar performance,
More informationScalable Ethernet Clos-Switches. Norbert Eicker John von Neumann-Institute for Computing Ferdinand Geier ParTec Cluster Competence Center GmbH
Scalable Ethernet Clos-Switches Norbert Eicker John von Neumann-Institute for Computing Ferdinand Geier ParTec Cluster Competence Center GmbH Outline Motivation Clos-Switches Ethernet Crossbar Switches
More informationHigh-performance message striping over reliable transport protocols
J Supercomput (2006) 38:261 278 DOI 10.1007/s11227-006-8443-6 High-performance message striping over reliable transport protocols Nader Mohamed Jameela Al-Jaroodi Hong Jiang David Swanson C Science + Business
More informationEN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University
EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University Material from: The Datacenter as a Computer: An Introduction to
More informationThe Red Storm System: Architecture, System Update and Performance Analysis
The Red Storm System: Architecture, System Update and Performance Analysis Douglas Doerfler, Jim Tomkins Sandia National Laboratories Center for Computation, Computers, Information and Mathematics LACSI
More information6LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃ7LPHÃIRUÃDÃ6SDFH7LPH $GDSWLYHÃ3URFHVVLQJÃ$OJRULWKPÃRQÃDÃ3DUDOOHOÃ(PEHGGHG 6\VWHP
LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃLPHÃIRUÃDÃSDFHLPH $GDSWLYHÃURFHVVLQJÃ$OJRULWKPÃRQÃDÃDUDOOHOÃ(PEHGGHG \VWHP Jack M. West and John K. Antonio Department of Computer Science, P.O. Box, Texas Tech University,
More informationComputer Networks Principles LAN - Ethernet
Computer Networks Principles LAN - Ethernet Prof. Andrzej Duda duda@imag.fr http://duda.imag.fr 1 Interconnection structure - layer 3 interconnection layer 3 router subnetwork 1 interconnection layer 2
More informationRDMA-like VirtIO Network Device for Palacios Virtual Machines
RDMA-like VirtIO Network Device for Palacios Virtual Machines Kevin Pedretti UNM ID: 101511969 CS-591 Special Topics in Virtualization May 10, 2012 Abstract This project developed an RDMA-like VirtIO network
More informationSUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine
SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine April 2007 Part No 820-1270-11 Revision 1.1, 4/18/07
More informationCluster Computing Paul A. Farrell 9/15/2011. Dept of Computer Science Kent State University 1. Benchmarking CPU Performance
Many benchmarks available MHz (cycle speed of processor) MIPS (million instructions per second) Peak FLOPS Whetstone Stresses unoptimized scalar performance, since it is designed to defeat any effort to
More informationLab Determining Data Storage Capacity
Lab 1.3.2 Determining Data Storage Capacity Objectives Determine the amount of RAM (in MB) installed in a PC. Determine the size of the hard disk drive (in GB) installed in a PC. Determine the used and
More informationStorage Hierarchy Management for Scientific Computing
Storage Hierarchy Management for Scientific Computing by Ethan Leo Miller Sc. B. (Brown University) 1987 M.S. (University of California at Berkeley) 1990 A dissertation submitted in partial satisfaction
More informationBlueGene/L. Computer Science, University of Warwick. Source: IBM
BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours
More informationrepresent parallel computers, so distributed systems such as Does not consider storage or I/O issues
Top500 Supercomputer list represent parallel computers, so distributed systems such as SETI@Home are not considered Does not consider storage or I/O issues Both custom designed machines and commodity machines
More informationAlgorithms and Architecture. William D. Gropp Mathematics and Computer Science
Algorithms and Architecture William D. Gropp Mathematics and Computer Science www.mcs.anl.gov/~gropp Algorithms What is an algorithm? A set of instructions to perform a task How do we evaluate an algorithm?
More informationMicrosoft Office SharePoint Server 2007
Microsoft Office SharePoint Server 2007 Enabled by EMC Celerra Unified Storage and Microsoft Hyper-V Reference Architecture Copyright 2010 EMC Corporation. All rights reserved. Published May, 2010 EMC
More informationINFOBrief. Dell-IBRIX Cluster File System Solution. Key Points
INFOBrief Dell-IBRIX Cluster File System Solution High-performance parallel, segmented file system for scale-out clusters, grid computing, and enterprise applications Capable of delivering linear scalability
More informationIBM POWER8 100 GigE Adapter Best Practices
Introduction IBM POWER8 100 GigE Adapter Best Practices With higher network speeds in new network adapters, achieving peak performance requires careful tuning of the adapters and workloads using them.
More informationMeasuring the Processing Performance of NetSniff
Measuring the Processing Performance of NetSniff Julie-Anne Bussiere *, Jason But Centre for Advanced Internet Architectures. Technical Report 050823A Swinburne University of Technology Melbourne, Australia
More informationKeywords Cluster, Hardware, Software, System, Applications
Volume 6, Issue 9, September 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Study on
More informationINTERCONNECTION TECHNOLOGIES. Non-Uniform Memory Access Seminar Elina Zarisheva
INTERCONNECTION TECHNOLOGIES Non-Uniform Memory Access Seminar Elina Zarisheva 26.11.2014 26.11.2014 NUMA Seminar Elina Zarisheva 2 Agenda Network topology Logical vs. physical topology Logical topologies
More informationClusters. Rob Kunz and Justin Watson. Penn State Applied Research Laboratory
Clusters Rob Kunz and Justin Watson Penn State Applied Research Laboratory rfk102@psu.edu Contents Beowulf Cluster History Hardware Elements Networking Software Performance & Scalability Infrastructure
More informationMeasuring Performance. Speed-up, Amdahl s Law, Gustafson s Law, efficiency, benchmarks
Measuring Performance Speed-up, Amdahl s Law, Gustafson s Law, efficiency, benchmarks Why Measure Performance? Performance tells you how you are doing and whether things can be improved appreciably When
More informationPerformance Analysis and Prediction for distributed homogeneous Clusters
Performance Analysis and Prediction for distributed homogeneous Clusters Heinz Kredel, Hans-Günther Kruse, Sabine Richling, Erich Strohmaier IT-Center, University of Mannheim, Germany IT-Center, University
More informationCSE398: Network Systems Design
CSE398: Network Systems Design Instructor: Dr. Liang Cheng Department of Computer Science and Engineering P.C. Rossin College of Engineering & Applied Science Lehigh University March 14, 2005 Outline Classification
More informationChapter 6: Network Communications and Protocols
Learning Objectives Chapter 6: Network Communications and Protocols Understand the function and structure of packets in a network, and analyze and understand those packets Understand the function of protocols
More informationLINPACK Benchmark. on the Fujitsu AP The LINPACK Benchmark. Assumptions. A popular benchmark for floating-point performance. Richard P.
1 2 The LINPACK Benchmark on the Fujitsu AP 1000 Richard P. Brent Computer Sciences Laboratory The LINPACK Benchmark A popular benchmark for floating-point performance. Involves the solution of a nonsingular
More informationReal Parallel Computers
Real Parallel Computers Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005 Short history
More informationDiffusing Your Mobile Apps: Extending In-Network Function Virtualisation to Mobile Function Offloading
Diffusing Your Mobile Apps: Extending In-Network Function Virtualisation to Mobile Function Offloading Mario Almeida, Liang Wang*, Jeremy Blackburn, Konstantina Papagiannaki, Jon Crowcroft* Telefonica
More informationLOCAL AREA NETWORKS Q&A Topic 4: VLAN
A. Indicate whether the sentence or statement is true or false: 1. VLANs are more flexible in handling moves and additions of ports than routers 2. VLANs logically segment the physical LAN infrastructure
More informationCisco Nexus 4000 Series Switches for IBM BladeCenter
Cisco Nexus 4000 Series Switches for IBM BladeCenter What You Will Learn This document is targeted at server, storage, and network administrators planning to deploy IBM BladeCenter servers with the unified
More informationInformation Communications Technology (CE-ICT) 6 th Class
Information Communications Technology (CE-ICT) 6 th Class Lecture 2: Computer Concepts (Part A) Lecturer: Objectives Hardware Concepts Types of Computer Components of a Computer System Computer Performance
More information1. ALMA Pipeline Cluster specification. 2. Compute processing node specification: $26K
1. ALMA Pipeline Cluster specification The following document describes the recommended hardware for the Chilean based cluster for the ALMA pipeline and local post processing to support early science and
More informationPrepared by Agha Mohammad Haidari Network Manager ICT Directorate Ministry of Communication & IT
Network Basics Prepared by Agha Mohammad Haidari Network Manager ICT Directorate Ministry of Communication & IT E-mail :Agha.m@mcit.gov.af Cell:0700148122 After this lesson,you will be able to : Define
More informationEnd-to-End Adaptive Packet Aggregation for High-Throughput I/O Bus Network Using Ethernet
Hot Interconnects 2014 End-to-End Adaptive Packet Aggregation for High-Throughput I/O Bus Network Using Ethernet Green Platform Research Laboratories, NEC, Japan J. Suzuki, Y. Hayashi, M. Kan, S. Miyakawa,
More informationLINUX. Benchmark problems have been calculated with dierent cluster con- gurations. The results obtained from these experiments are compared to those
Parallel Computing on PC Clusters - An Alternative to Supercomputers for Industrial Applications Michael Eberl 1, Wolfgang Karl 1, Carsten Trinitis 1 and Andreas Blaszczyk 2 1 Technische Universitat Munchen
More informationAssessing performance in HP LeftHand SANs
Assessing performance in HP LeftHand SANs HP LeftHand Starter, Virtualization, and Multi-Site SANs deliver reliable, scalable, and predictable performance White paper Introduction... 2 The advantages of
More informationNVIDIA nforce IGP TwinBank Memory Architecture
NVIDIA nforce IGP TwinBank Memory Architecture I. Memory Bandwidth and Capacity There s Never Enough With the recent advances in PC technologies, including high-speed processors, large broadband pipelines,
More informationIBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide
V7 Unified Asynchronous Replication Performance Reference Guide IBM V7 Unified R1.4.2 Asynchronous Replication Performance Reference Guide Document Version 1. SONAS / V7 Unified Asynchronous Replication
More informationInfoBrief. Platform ROCKS Enterprise Edition Dell Cluster Software Offering. Key Points
InfoBrief Platform ROCKS Enterprise Edition Dell Cluster Software Offering Key Points High Performance Computing Clusters (HPCC) offer a cost effective, scalable solution for demanding, compute intensive
More informationCommunication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.
Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance
More informationChapter 13: Mass-Storage Systems. Disk Scheduling. Disk Scheduling (Cont.) Disk Structure FCFS. Moving-Head Disk Mechanism
Chapter 13: Mass-Storage Systems Disk Scheduling Disk Structure Disk Scheduling Disk Management Swap-Space Management RAID Structure Disk Attachment Stable-Storage Implementation Tertiary Storage Devices
More informationChapter 13: Mass-Storage Systems. Disk Structure
Chapter 13: Mass-Storage Systems Disk Structure Disk Scheduling Disk Management Swap-Space Management RAID Structure Disk Attachment Stable-Storage Implementation Tertiary Storage Devices Operating System
More informationLUSTRE NETWORKING High-Performance Features and Flexible Support for a Wide Array of Networks White Paper November Abstract
LUSTRE NETWORKING High-Performance Features and Flexible Support for a Wide Array of Networks White Paper November 2008 Abstract This paper provides information about Lustre networking that can be used
More informationSAP High-Performance Analytic Appliance on the Cisco Unified Computing System
Solution Overview SAP High-Performance Analytic Appliance on the Cisco Unified Computing System What You Will Learn The SAP High-Performance Analytic Appliance (HANA) is a new non-intrusive hardware and
More informationExtending the LAN. Context. Info 341 Networking and Distributed Applications. Building up the network. How to hook things together. Media NIC 10/18/10
Extending the LAN Info 341 Networking and Distributed Applications Context Building up the network Media NIC Application How to hook things together Transport Internetwork Network Access Physical Internet
More informationNetwork Design Considerations for Grid Computing
Network Design Considerations for Grid Computing Engineering Systems How Bandwidth, Latency, and Packet Size Impact Grid Job Performance by Erik Burrows, Engineering Systems Analyst, Principal, Broadcom
More informationPLANEAMENTO E GESTÃO DE REDES INFORMÁTICAS COMPUTER NETWORKS PLANNING AND MANAGEMENT
Mestrado em Engenharia Informática e de Computadores PLANEAMENTO E GESTÃO DE REDES INFORMÁTICAS COMPUTER NETWORKS PLANNING AND MANAGEMENT 2010-2011 Tecnologias de Redes Informáticas - I - Computer Networks
More informationDiffusion TM 5.0 Performance Benchmarks
Diffusion TM 5.0 Performance Benchmarks Contents Introduction 3 Benchmark Overview 3 Methodology 4 Results 5 Conclusion 7 Appendix A Environment 8 Diffusion TM 5.0 Performance Benchmarks 2 1 Introduction
More informationGUIDE. Optimal Network Designs with Cohesity
Optimal Network Designs with Cohesity TABLE OF CONTENTS Introduction...3 Key Concepts...4 Five Common Configurations...5 3.1 Simple Topology...5 3.2 Standard Topology...6 3.3 Layered Topology...7 3.4 Cisco
More informationSupercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?
Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, Mateo Valero SC 13, November 19 th 2013, Denver, CO, USA
More informationEmulex Universal Multichannel
Emulex Universal Multichannel Reference Manual Versions 11.2 UMC-OCA-RM112 Emulex Universal Multichannel Reference Manual Corporate Headquarters San Jose, CA Website www.broadcom.com Broadcom, the pulse
More informationOptimizing LS-DYNA Productivity in Cluster Environments
10 th International LS-DYNA Users Conference Computing Technology Optimizing LS-DYNA Productivity in Cluster Environments Gilad Shainer and Swati Kher Mellanox Technologies Abstract Increasing demand for
More informationA Distance Learning Tool for Teaching Parallel Computing 1
A Distance Learning Tool for Teaching Parallel Computing 1 RAFAEL TIMÓTEO DE SOUSA JR., ALEXANDRE DE ARAÚJO MARTINS, GUSTAVO LUCHINE ISHIHARA, RICARDO STACIARINI PUTTINI, ROBSON DE OLIVEIRA ALBUQUERQUE
More informationCS 4453 Computer Networks Winter
CS 4453 Computer Networks Chapter 2 OSI Network Model 2015 Winter OSI model defines 7 layers Figure 1: OSI model Computer Networks R. Wei 2 The seven layers are as follows: Application Presentation Session
More informationWebSphere Application Server Base Performance
WebSphere Application Server Base Performance ii WebSphere Application Server Base Performance Contents WebSphere Application Server Base Performance............. 1 Introduction to the WebSphere Application
More informationACCRE High Performance Compute Cluster
6 중 1 2010-05-16 오후 1:44 Enabling Researcher-Driven Innovation and Exploration Mission / Services Research Publications User Support Education / Outreach A - Z Index Our Mission History Governance Services
More informationUnderutilizing Resources for HPC on Clouds
Aachen Institute for Advanced Study in Computational Engineering Science Preprint: AICES-21/6-1 1/June/21 Underutilizing Resources for HPC on Clouds R. Iakymchuk, J. Napper and P. Bientinesi Financial
More informationMission-Critical Databases in the Cloud. Oracle RAC in Microsoft Azure Enabled by FlashGrid Software.
Mission-Critical Databases in the Cloud. Oracle RAC in Microsoft Azure Enabled by FlashGrid Software. White Paper rev. 2017-10-16 2017 FlashGrid Inc. 1 www.flashgrid.io Abstract Ensuring high availability
More informationPerformance of Variant Memory Configurations for Cray XT Systems
Performance of Variant Memory Configurations for Cray XT Systems Wayne Joubert, Oak Ridge National Laboratory ABSTRACT: In late 29 NICS will upgrade its 832 socket Cray XT from Barcelona (4 cores/socket)
More informationQuiz for Chapter 6 Storage and Other I/O Topics 3.10
Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: 1. [6 points] Give a concise answer to each of the following
More informationWhy Your Application only Uses 10Mbps Even the Link is 1Gbps?
Why Your Application only Uses 10Mbps Even the Link is 1Gbps? Contents Introduction Background Information Overview of the Issue Bandwidth-Delay Product Verify Solution How to Tell Round Trip Time (RTT)
More informationParallel Computer Architecture II
Parallel Computer Architecture II Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-692 Heidelberg phone: 622/54-8264 email: Stefan.Lang@iwr.uni-heidelberg.de
More informationMultifunction Networking Adapters
Ethernet s Extreme Makeover: Multifunction Networking Adapters Chuck Hudson Manager, ProLiant Networking Technology Hewlett-Packard 2004 Hewlett-Packard Development Company, L.P. The information contained
More informationMotivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism
Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the
More informationV. Mass Storage Systems
TDIU25: Operating Systems V. Mass Storage Systems SGG9: chapter 12 o Mass storage: Hard disks, structure, scheduling, RAID Copyright Notice: The lecture notes are mainly based on modifications of the slides
More informationComposite Metrics for System Throughput in HPC
Composite Metrics for System Throughput in HPC John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2003 Phoenix, AZ November 18, 2003 Overview The HPC Challenge Benchmark was announced last
More informationAPENet: LQCD clusters a la APE
Overview Hardware/Software Benchmarks Conclusions APENet: LQCD clusters a la APE Concept, Development and Use Roberto Ammendola Istituto Nazionale di Fisica Nucleare, Sezione Roma Tor Vergata Centro Ricerce
More informationChapter 10: Mass-Storage Systems
COP 4610: Introduction to Operating Systems (Spring 2016) Chapter 10: Mass-Storage Systems Zhi Wang Florida State University Content Overview of Mass Storage Structure Disk Structure Disk Scheduling Disk
More informationHow to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić
How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about
More informationCopyright 2009 by Scholastic Inc. All rights reserved. Published by Scholastic Inc. PDF0090 (PDF)
Enterprise Edition Version 1.9 System Requirements and Technology Overview The Scholastic Achievement Manager (SAM) is the learning management system and technology platform for all Scholastic Enterprise
More informationiscsi Technology Brief Storage Area Network using Gbit Ethernet The iscsi Standard
iscsi Technology Brief Storage Area Network using Gbit Ethernet The iscsi Standard On February 11 th 2003, the Internet Engineering Task Force (IETF) ratified the iscsi standard. The IETF was made up of
More informationThe Oracle Database Appliance I/O and Performance Architecture
Simple Reliable Affordable The Oracle Database Appliance I/O and Performance Architecture Tammy Bednar, Sr. Principal Product Manager, ODA 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved.
More informationBig Orange Bramble. August 09, 2016
Big Orange Bramble August 09, 2016 Overview HPL SPH PiBrot Numeric Integration Parallel Pi Monte Carlo FDS DANNA HPL High Performance Linpack is a benchmark for clusters Created here at the University
More informationBrutus. Above and beyond Hreidar and Gonzales
Brutus Above and beyond Hreidar and Gonzales Dr. Olivier Byrde Head of HPC Group, IT Services, ETH Zurich Teodoro Brasacchio HPC Group, IT Services, ETH Zurich 1 Outline High-performance computing at ETH
More informationScalable and Fault Tolerant Failure Detection and Consensus
EuroMPI'15, Bordeaux, France, September 21-23, 2015 Scalable and Fault Tolerant Failure Detection and Consensus Amogh Katti, Giuseppe Di Fatta, University of Reading, UK Thomas Naughton, Christian Engelmann
More informationTechnical Brief: Specifying a PC for Mascot
Technical Brief: Specifying a PC for Mascot Matrix Science 8 Wyndham Place London W1H 1PP United Kingdom Tel: +44 (0)20 7723 2142 Fax: +44 (0)20 7725 9360 info@matrixscience.com http://www.matrixscience.com
More informationExperiences with the Parallel Virtual File System (PVFS) in Linux Clusters
Experiences with the Parallel Virtual File System (PVFS) in Linux Clusters Kent Milfeld, Avijit Purkayastha, Chona Guiang Texas Advanced Computing Center The University of Texas Austin, Texas USA Abstract
More informationData Communication. Introduction of Communication. Data Communication. Elements of Data Communication (Communication Model)
Data Communication Introduction of Communication The need to communicate is part of man s inherent being. Since the beginning of time the human race has communicated using different techniques and methods.
More information