Work Project Report: Benchmark for 100 Gbps Ethernet network analysis

Size: px

Start display at page:

Download "Work Project Report: Benchmark for 100 Gbps Ethernet network analysis"

Hugh Campbell
5 years ago
Views:

1 Work Project Report: Benchmark for 100 Gbps Ethernet network analysis CERN Summer Student Programme 2016 Student: Iraklis Moutidis Main supervisor: Balazs Voneki Second supervisor: Dr. Niko Neufeld Division: EP-LBC Project ID: Note reference: LHCb-PUB Project name: Automatized benchmark for automatic launching, scheduling and analysing various aspects of the network. Introduction LHCb-PUB /09/2016 During the Long Shutdown 2 ( ) the LHCb experiment will be upgraded in order to reach extremely high precision on the main observable of the b and c-quarks sectors. In its current state the LHCb experiment has a readout rate of 1.1 MHz within a fixed latency, which causes the collision rate to be limited. The removal of this bottleneck is one of the main objectives of the LHCb upgrade. This will be achieved by implementing a trigger-less readout system where the functionality of the trigger will be executed by software. The trigger-less readout system must be able to handle large bandwidth such as 4 TBytes/s. For that reason the design of the system will include readout boards that can transmit information at the rate of 100 Gbits/s and a high throughput local area network [1]. The main goal of my project is to create an automatized benchmark for automatic launching, scheduling and analysing the performance of the readout boards and the network, in order to decide which manufacturer offers the best solution for our implementation, what configuration of the network offers the best performance and what configuration of the readout system is the most cost efficient. Project Implementation The automatized benchmark was implemented using bash [2] and its goal was to coordinate the different connected nodes to transfer information with each other in order to measure the achieved bandwidth depending on the different configurations applied to the nodes. An example of the node network is given in Figure 1: Node network example. Each node sends data simultaneously to every node of the network. For transmitting data between the nodes we used the iperf tool, version (6 Mar 2015). iperf is an open source tool for active measurements of the maximum achievable bandwidth on IP networks. It supports tuning on various parameters related to timing, buffers and protocols (TCP, UDP, SCTP with IPv4 and IPv6). For each test it reports the bandwidth, loss and other parameters. iperf was originally developed by NLANR/DAST [3]. To accurately test the performance of the nodes we had to

2 make them transmit to each other simultaneously. To achieve that we used the iperf tool for transmitting data between the nodes and the MPICH interface to synchronize them. MPICH is a high-performance and widely portable implementation of the Message Passing Interface (MPI) standard (MPI-1, MPI-2 and MPI-3). The goals of MPICH are: (1) to provide an MPI implementation that efficiently supports different computation and communication platforms including commodity clusters (desktop systems, shared-memory systems, multicore architectures), high-speed networks (10 Gigabit Ethernet, InfiniBand, Myrinet, Quadrics) and proprietary high-end computing systems (Blue Gene, Cray) and (2) to enable cutting-edge research in MPI through an easy-to-extend modular framework for other derived implementations [4]. The configurations that we applied to the tested networks were 4: Number of processes (processors) per node. In our experiments we tested 2 to 32 processes with no CPU pinning. Transmitting window size. The window size in our experiments varied from 8 Kbyte to 416 Kbyte and was increased by 8 Kbyte per test. Transmission time. We used 10, 20, 30 and 40 seconds transmission time. Number of nodes used on the network. We used 3 to 7 nodes in our experiments. Lastly in some cases, in order to generate the appropriate script for the different tests I used the Python programming language. I did so, because it was easier to dynamically form each mpi/iperf command with different parameters depending the desired configurations we wanted to apply for each test. Figure 1: Node network example. Each node sends data simultaneously to every node of the network.

3 Results We run a number of experiments using the installed 100 Gbit/sec network cards in a number of configurations (as mentioned on Project Implementation). By examining the results of the test we made some observations: First some nodes are running faster than others, we have to investigate further the reason that causes this behaviour. To verify that this behaviour is always happening we run a number of tests with different data transmission time duration (10, 20, 30, 40 seconds) and also run a test that the nodes are transiting data sequentially. In all of our example some nodes (lab26) were performing better than others. All the nodes of the experiments had identical hardware: 2 sockets, CPU: Intel(R) Xeon(R) CPU E GHz with 8 cores and 16 threads, RAM: 64 GB (8 x 8 GB 2133 MHz, but configured clock speed = 1866 MHz) and BIOS with the default settings. Second the network gets often overloaded when we run the experiments on more than 4 nodes and with window size approximately bigger than 100 Kbytes. In many cases the tests could not be completed because of that reason. Error message were: write failed connection reset by peer and connect failed: No route to host. 3 Nodes performance results The highest bandwidth that we achieved in our test using three nodes was Gbits/sec. The processes number was 28 and the window size was 176 Kbytes. On Table 1: Configurations with the best performance for 3 nodes are the top 5 configurations of the test Table 1: Configurations with the best performance for 3 nodes In Figure 2: Overall results for 3 node tests., we present the overall performance for the 3 node tested network. 4 Nodes performance results For the 4 node configuration the highest achieved bandwidth was Gbits/sec with 24 processes and window size 44 Kbytes. On Table 2: Configurations with the best performance for 4 nodes are the top 5 configurations of the test. Figure 3: Overall results for 4 node tests presents the performance of the 4 node network Table 2: Configurations with the best performance for 4 nodes

4 Figure 2: Overall results for 3 node tests. Figure 3: Overall results for 4 node tests

5 5 Nodes performance results Running the test on 5 nodes was very problematic. Most of the times the network was overloaded and the test had to be terminated. We tried all the TCP buffersize ranges, but it did not help. The root cause is not clear. We could measure the bandwidth only for 15 and 16 processes. The highest bandwidth that we achieved was Gbits/sec. The processes number was 15 and the window size was 148 Kbytes. On Table 3: Configurations with the best performance for 5 nodes are the top 5 configurations of the test Table 3: Configurations with the best performance for 5 nodes In Figure 4: Overall results for 5 node tests we present the overall performance for the 5 node tested network. 7 Nodes performance results For the 7 node configuration we could run the tests for 2 to 15 nodes. Using more processes than 15 made the network to be overloaded. The highest achieved bandwidth was Gbits/sec with 13 processes and window size 196 Kbytes. On Table 4: Configurations with the best performance for 7 nodes are the top 5 configurations of the test. Figure 5: Overall results for 7 node tests presents the performance of the 7 node network Table 4: Configurations with the best performance for 7 nodes

6 Figure 4: Overall results for 5 node tests Figure 5: Overall results for 7 node tests

7 Conclusion During my stay at CERN I implemented an automatized benchmark for automatic launching, scheduling and analysing the performance of the readout boards and the network. During the work on this project, I acquired a lot of knowledge about bash scripting and learned how to use tools for network benchmarking (iperf) and string manipulation (awk). The implemented benchmarking tool can test a given network for various configurations and help the user to identify the best set up to accomplish maximum performance. References [1] LHCb Trigger and Online Upgrade Technical Design Report. European Organization for Nuclear Research (CERN), CERN/LHCC , [2] "Wikipedia," [Online]. Available: [3] "iperf The network bandwidth measurement tool.," [Online]. Available: [4] "MPICH," [Online]. Available:

A first look at 100 Gbps LAN technologies, with an emphasis on future DAQ applications.

A first look at 100 Gbps LAN technologies, with an emphasis on future DAQ applications. 21st International Conference on Computing in High Energy and Nuclear Physics (CHEP21) IOP Publishing Journal of Physics: Conference Series 664 (21) 23 doi:1.188/1742-696/664//23 A first look at 1 Gbps