Parallel computation performances of Serpent and Serpent 2 on KTH Parallel Dator Centrum

Size: px
Start display at page:

Download "Parallel computation performances of Serpent and Serpent 2 on KTH Parallel Dator Centrum"

Transcription

1 KTH ROYAL INSTITUTE OF TECHNOLOGY, SH2704, 9 MAY Parallel computation performances of Serpent and Serpent 2 on KTH Parallel Dator Centrum Belle Andrea, Pourcelot Gregoire Abstract The aim of this project was to investigate the computation efficiency of Serpent and Serpent 2 using the KTH supercomputer. Several simulations were run using different input parameters and parallel mode configurations, in order to have a wide view of the process of parallelization. Increases and decreases of the computation time, changing various parameters and configurations, were studied. I. INTRODUCTION The parallel calculations in a Monte Carlo code such as Serpent and Serpent 2 consist in splitting the size and the computational cost of the simulation through different parts. In this way, the computation time likely decreases. Nevertheless, this kind of application often requires to be performed in a powerful machine. For this project, the KTH supercomputer, the so-called Parallel Dator Centrum or PDC, was used for all the simulations. The codes used were Serpent, version , and Serpent 2, version A. Parallel Dator Centrum The Parallel Dator Centrum, or PDC, is the KTH supercomputer. It consists of two main parts, called clusters, which are Beskow and Tegner. Each of this machine is formed by several units called nodes, and each node contains many cores or CPUs. Many hardware configuration are available for parallel calculation, depending on the needed computation power and the complexity of the simulation [1]. B. PDC and Serpent configuration As mentioned before, Serpent version and Serpent 2 version and Tegner machine were used. In particular, a specific part of Tegner was available, with 46 nodes. Each node has 24 Intel E5-2690v3 Haswell cores, with a configuration 2x12, and 512 GB of RAM [1]. The codes were compiled using the gcc/7.2.0 and openmpi/3.0- gcc-7.2 compilers, and each simulation was launched using a protocol script called sbatch, in order to follow the security procedure of the supercomputer. C. Parallel computation mode Both Serpent and Serpent 2 support parallel calculations. In Serpent only the MPI mode (Message Passing Interface) is available. It consists in splitting the simulation into a specific number of parts, called tasks. The total memory available is distributed through the tasks. Each of them runs a small part of the total simulation, and the results are then combined at the end of the whole simulation, using the independent simulations scheme. In this particular case, each node was divided into 24 MPI tasks, each corresponding to a single core. The batch size, or the number of neutron histories simulated per cycle, is divided into a certain number of cores, and the results are then combined using the aforementioned independent simulations scheme. Serpent 2 allows to utilize both MPI and OpenMP parallel mode. The MPI mode is the same previously described for Serpent, while OpenMP is a different parallel mode. It consists into splitting the simulation in a certain number of parts called threads. The memory is not equally distributed, but it is shared among all the threads. Serpent 2 can also implement the so-called Hybrid MPI- OpenMP mode, that allows to merge the features of the two modes in order to find an optimal configuration. Each node can be divided into some MPI tasks, and each task can be divided into some OpenMP threads. The total memory is therefore divided and equally distributed among the MPI tasks, and each MPI memory size is shared between each OpenMP threads inside the task itself. In this case, therefore, the batch size is split into the MPI tasks, and then each neutron history is simulated in a different thread. The results are then combined using a sort of master/slave scheme in the OpenMP threads. Results from different MPI tasks are then combined at the end as independent simulations. A. Serpent input files II. SIMULATION PROCEDURE For all the simulations, a single input file used. It consists in a BWR 2D fuel assembly [2]. The geometry is visible in the figure 1. Each pin has a pitch of cm, and the assembly has a pitch of cm. The fuel is UO 2, with different level of concentration of 235 U and 238 U. In the figure, different pin colors correspond to different level of enrichment. In some fuel pins, corresponding to the blue color in the figure, the uranium dioxide is mixed with gadolinium. The moderator is light water, the cladding and box material is a zirconium alloy. Several types of detectors are also present. In order to evaluate the influence of the input file geometry on the computation efficiency, a different type of fuel assembly, visible in the figure 2, was used. It consists in a CANDU 2D fuel cluster [2]. The fuel material is uranium dioxide, UO 2 with an 235 U enrichment of 0.7% (natural uranium). The moderator is heavy water (D 2 O)

2 KTH ROYAL INSTITUTE OF TECHNOLOGY, SH2704, 9 MAY Fig. 1. Geometry of the BWR fuel assembly using the figure of merit: with: F OM = 1 σ 2 t FOM = figure of merit σ 2 = standard deviation t = computation time. The main parameter in order to analyze the efficiency of a parallel simulation is the speed-up parameter defined by Gene Amdahl [3] with the following formula: with: 1 s = (1 F ) + F N s = speed-up parameter F = parallelizable fraction of the simulation N = number of processors used in the simulation. For simplicity, the speed-up parameter can be considered also as the ratio between the FOM of a reference simulation and the FOM of the simulation which has to be evaluated. In this study, for each series of simulations, the seed, the batch size and the number of active/inactive cycles are preserved [2], and therefore the standard deviation can be considered constant. The speed-up parameter can therefore be expressed as: s = F OM (n) = σ2 t (Ref) F OM (Ref) σ 2 = t (Ref). t (n) t (n) In the case of MPI mode, the computation time of the simulation using one core was taken as reference for the speed-up parameter. In the case of Hybrid MPI-OpenMP mode with Serpent 2, only the computation time was taken into account for the evaluation of the results. Fig. 2. Geometry of the CANDU fuel assembly and the structure materials are different zirconium alloy. No detector is present. Different combinations of batch size and active/inactive cycles were used during the study, in order to optimize the simulations and evaluate the influence of the batch size on the efficiency. For all the simulations the same seed (1.5E7) was used, in order to preserve the random numbers series and to keep the results unbiased by statistical fluctuations. B. Speed-up parameter The main goal of this study is to evaluate the changes in the computation time with various input parameters and hardware configuration, such as number of cores or nodes involved in the parallelization. The easiest way to evaluate the efficiency of a simulation is C. Results evaluation Each series of simulation was evaluated taking into account the changes in the speed-up and the actual computation time in seconds with the increase of the number of cores and nodes. Only computation times from the Serpent output file were evaluated. Indeed, computation times in PDC output files were slightly longer due to execution and procedure time required by the supercomputer. Exploiting this extra time would have biased the results. Each simulation series focused on the utilization of up to three nodes. III. MPI MODE RESULTS A. Serpent and Serpent 2 comparison The first series of simulation was run using the BWR input file, the MPI mode for both Serpent and Serpent 2, a batch size of 20,000 neutrons, 5000 active cycle and 200 inactive cycles. The speed-up parameter and the computation time were evaluated for both Serpent and Serpent 2, and then compared. The simulations were run using different numbers of MPI tasks: [1, 28], 30, 32, 36, 40, 44, [48, 52]. Each task corresponded

3 KTH ROYAL INSTITUTE OF TECHNOLOGY, SH2704, 9 MAY to one core, or CPU. The choice of these values is to optimize the study of the parameters between the first and the second nodes, and between the second and the third one. The plot of the computation time is visible in the figure 3, the plot of the speed-up parameter is visible in the figure 4. It can be clearly seen that, increasing the number of cores involved in the parallel simulation, the computation time decreases exponentially and the speed-up parameter increases linearly. From the figure 4, it can be noticed that the patterns of the speed-up parameter of Serpent and Serpent 2 are similar, and both of them can be approximated with a linear function. The data of the fitting are visible in the table 1. TABLE I SPEED-UP PARAMETER FITTING FOR BWR y=0.7968x y=0.7707x this means that the decrease of the computation time is not perfectly inversely proportional to the increase of the number of cores. For example, when using two cores rather one, the computation time is not the half of the previous one, but slightly bigger. This phenomenon is known as overhead [4], and it is due the communication, execution and process time required by the machine which is performing the parallel simulation. Serpent seems to be slightly less influenced by this factor. Nevertheless, it has to be noticed that Serpent 2 is more stable and less inclined to instabilities when an extra node is needed. The pattern of the speed-up parameter is indeed more linear and with less fluctuations. On the other hand, Serpent presents a more unstable pattern, with a slight instability between the first and the second node, and a more pronounced fluctuation between the second and the third one. All these differences are probably due to the different internal architecture of the codes. B. Influence of the geometry The influence of the geometry was evaluated running a series of simulations, using the same number of cores of the previous one, but using the CANDU cluster geometry. The results for the computation time and speed-up parameter are visible respectively in the figure 5 and 6. It can be noticed that both the plots are very similar to the previous ones for the BWR assembly geometry. Fig. 3. Plot of computation time and number of cores for BWR with batch size 20,000 neutrons, 5000 active cycles, 200 inactive cycles Fig. 5. Plot of computation time and number of cores for CANDU with batch size 20,000 neutrons, 5000 active cycles, 200 inactive cycles TABLE II SPEED-UP PARAMETER FITTING FOR CANDU y=0.7911x y=0.7806x Fig. 4. Plot of speed-up parameter and number of cores for BWR with batch size 20,000 neutrons, 5000 active cycles, 200 inactive cycles The slope of linear function of the speed-up parameter for Serpent is , while for Serpent 2 is This means that the increase of the speed-up parameter, or the decrease of the computation time, in Serpent is slightly faster rather than Serpent 2. The slope is, as expected, smaller than 1: The plot of the speed-up parameter was again approximated using linear functions, visible in the table 2. The slope of Serpent linear function is slightly bigger, and the latter seems again to be slightly more efficient. It has to be noticed that the slopes differ of a value smaller than 5%, either using the BWR or the CANDU geometry. Moreover, the patterns of the speed-up parameter are very similar. Using Serpent and either the BWR or the CANDU geometry, the pattern is

4 KTH ROYAL INSTITUTE OF TECHNOLOGY, SH2704, 9 MAY Fig. 6. Plot of speed-up parameter and number of cores for CANDU with batch size 20,000 neutrons, 5000 active cycles, 200 inactive cycles Fig. 8. Plot of speed-up parameter and number of cores for BWR with batch size 50,000 neutrons, 5000 active cycles, 200 inactive cycles more irregular, with more pronounced instabilities among the interface region of different nodes. Using Serpent 2, the pattern is more regular and the fluctuations less pronounced. It can be therefore concluded that, in this case, a different geometry does not bring any considerable changes in the computation time efficiency of Serpent and Serpent 2. The small differences between the two series are not particularly relevant and they are probably caused by some statistical fluctuations due to the different input file. C. Influence of batch size The batch size was changed from 20,000 to 50,000 neutron histories per cycle. The results are visible on the figure 7 and 8 and in the table 3. As it can be clearly seen, the results are have a considerable impact on the pattern of the computation time and the speed-up parameter. Serpent is still slightly more efficient and more unstable, and the fluctuations between the second and the third node are more pronounced than the ones between the first and the second node. Serpent 2 seems to be more regular in its trend. The fitting slopes of the speed-up parameter are similar and comparable to the previous ones. D. Influence of the number of active/inactive cycles The influence of the number of active and inactive cycles was evaluated with this series of simulations. The number of cores used was always [1, 28], 30, 32, 36, 40, 44, [48, 52], the batch size was 50,000 neutrons, the number of active and inactive cycles were 12,500 and 500. The plots of the computation time and the speed-up parameter are available in the figure 7 and 8. The results are again similar to the previous ones, with an exponential decrease of the computation time and a linear increase of the speed-up parameter. The data of the linear fitting of the speed-up parameter are visible in the table 4. TABLE IV SPEED-UP PARAMETER FITTING FOR BWR, 50,000 NEUTRONS, 12,500 ACTIVE CYCLES, 500 INACTIVE CYCLES y=0.7914x y=0.7755x Fig. 7. Plot of computation time and number of cores for BWR with batch size 50,000 neutrons, 5000 active cycles, 200 inactive cycles TABLE III SPEED-UP PARAMETER FITTING FOR BWR, 50,000 NEUTRONS, 5,000 ACTIVE CYCLES, 200 INACTIVE CYCLES y=0.8047x y=0.7772x similar to the previous ones, and the batch does not seem to The results regarding the slopes of the speed-up parameter plots are comparable with the previous ones, and the efficiency of the computation time can be considered the same. The big difference with the previous results is the pronounced fluctuations in the interface region between nodes in Serpent. Indeed, the instabilities between the first and the second node, and between the second and the third one, are bigger than before. In particular, the fluctuation of the speed-up parameter between the first and the second node is considerably bigger than the one in the previous simulations. On the other hand, Serpent 2 confirmed its more stable behavior. The explanation of this different behavior is due to the internal

5 KTH ROYAL INSTITUTE OF TECHNOLOGY, SH2704, 9 MAY TABLE V HYBRID MPI-OPENMP COMBINATIONS Total MPI tasks MPI tasks per node OpenMP threads per task Fig. 9. Plot of computation time and number of cores for BWR with batch size 50,000, 12,500 active cycles, 500 inactive cycles Fig. 10. Plot of speed-up parameter and number of cores for BWR with batch size 50,000, 12,500 active cycles, 500 inactive cycles (20,000 and 50,000), as visible in the figures 11, 12, 13 and 14. The results were evaluated using only the computation time, and they showed a similar trend. Passing from a pure MPI mode, with a total number of 72 MPI tasks (24 per node) with 1 OpenMP threads per task, to a hybrid mode with a 12 MPI tasks (4 per node) and 6 threads per task, the computation time slightly decreases. Using 9 MPI tasks (3 per node) with 8 threads per node, the computation time increases considerably. This is due to the hardware architecture of the Haswell nodes used for the simulations. In the node, indeed, there are 24 cores, and they are divided into 2 separate blocks of 12 cores each. If a node is divided into 3 MPI tasks and 8 threads (cores) per node, one of the task has four cores into a block, and the other four cores in the other one: this means that the memory of this task is shared within the blocks, and additional process time is needed due to the communication between the two blocks. The same consideration can be done for the last point of each simulation series, where each node accounts for 1 single MPI task with 24 OpenMP threads: also in this case the needed communication time between the two blocks influences negatively the total computation time. differences of the codes. Serpent could be more prone to instabilities due to the way of splitting the batch size through the MPI tasks. When a new node is necessary due to the increasing number of tasks, Serpent probably requires more time than Serpent 2 in order to split the batch size when only one or two cores of a new node are included in the simulation. Another factor could be the communication and process time at the beginning and at the end of the simulation: indeed, when only few cores of a new node are used, this communication process between MPI tasks could be not optimized. The cause of these fluctuations could be therefore due to the architecture of the Serpent code, and a higher number of active/inactive cycles, and therefore longer simulations, seem to increase the magnitude of the fluctuations in Serpent. IV. HYBRID MPI-OPENMP RESULTS The Hybryd MPI-OpenMP mode was evaluated within 3 nodes, starting from a pure MPI mode and ending to a pure OpenMP mode, as visible in the table 5. These combinations were evaluated using four series of simulations, with different geometries (BWR and CANDU) and different batch size Fig. 11. Plot of computation time and number OpenMP threads for BWR with batch size 20,000, 5,000 active cycles, 200 inactive cycles A. MPI mode V. CONCLUSION Both Serpent and Serpent 2, increasing the number of cores used per simulation, present an exponential decrease of the

6 KTH ROYAL INSTITUTE OF TECHNOLOGY, SH2704, 9 MAY Fig. 12. Plot of speed-up parameter and number OpenMP threads for BWR with batch size 50,000, 5,000 active cycles, 200 inactive cycles the value of the speed-up slope is always somewhat higher than the one for Serpent 2. On the other hand, Serpent 2 presents a more stable trend, with very small instabilities; Serpent instead shows more pronounced fluctuations in the nodes interface region. Geometry and batch size do not seem to have a considerable influence on the results in both Serpent and Serpent 2. Computation time and speed-up trends are indeed similar. The number of active/inactive cycles seems to have a stronger influence in Serpent. It was shown indeed that, the longer are the simulations, the more pronounced will be the instabilities, especially when passing from one to two nodes used, adding only one or two cores in the new node. In this particular case, visible in the figure 10, it can be clearly seen that adding an extra core to the parallel simulation is not always an advantage for the computation efficiency, since it could lead to an increase of the computation time. On the other hand, Serpent 2 did not show any considerable changes. The differences between Serpent and Serpent 2 should be ascribed to the intrinsic differences in the code architecture. In particular, some specific reasons could be the way of splitting the batch size among the cores, the communication system between each independent part of the simulation and the method of collecting and combining the results using the independent simulations scheme. Fig. 13. Plot of computation time and number OpenMP threads for CANDU with batch size 20,000, 5,000 active cycles, 200 inactive cycles Fig. 14. Plot of computation time and number OpenMP threads for CANDU with batch size 50,000, 5,000 active cycles, 200 inactive cycles computation time. The speed-up parameter increases linearly in both the cases. The fitting slopes are similar, and they can be approximated to a value of 0.78±0.04 in all the simulations. This value of the speed-up slopes is symptom of a good efficiency. Some clear differences between the two codes emerged during the study. Serpent seems to be slightly more efficient, since B. Hybrid MPI-OpenMP The results of the Hybrid MPI-OpenMP mode show a similar pattern for different geometries and batch sizes. The computation time seems to be strongly influenced by the internal hardware architecture of the supercomputer. In particular, the division of each node into 24 cores, divided into two blocks of 12 cores, plays a key-role. It is clear indeed that if the communication time is not optimized, due to the division in tasks and threads, the total computation time will increase. This fact is verified in the 6th point (3 MPI tasks per node, 8 OpenMP threads per task) and the 8th point (1 MPI task per node, 24 OpenMP threads per task) of each simulation series. The most efficient point for each simulation series is the 5th one (4 MPI tasks per node, 6 OpenMP threads per task). Also the other points (1st, 2nd, 3rd, 4th, 7th) are quite efficient: indeed, their computation times differ for less than 10% from the most efficient one. These minor differences and their causes are difficult to evaluate, and they would require a deeper investigation. REFERENCES [1] (last access 3 April 2018). [2] Serpent - a Continuous-energy Monte Carlo Reactor Physics Burnup Calculation Code, Users Manual, Jaakko Leppnen, 18 June [3] law (last access 3 April 2018). [4] comp/ (last access 3 April 2018).

1 st International Serpent User Group Meeting in Dresden, Germany, September 15 16, 2011

1 st International Serpent User Group Meeting in Dresden, Germany, September 15 16, 2011 1 st International Serpent User Group Meeting in Dresden, Germany, September 15 16, 2011 Discussion notes The first international Serpent user group meeting was held at the Helmholtz Zentrum Dresden Rossendorf

More information

BEAVRS benchmark calculations with Serpent-ARES code sequence

BEAVRS benchmark calculations with Serpent-ARES code sequence BEAVRS benchmark calculations with Serpent-ARES code sequence Jaakko Leppänen rd International Serpent User Group Meeting Berkeley, CA, Nov. 6-8, Outline Goal of the study The ARES nodal diffusion code

More information

PSG2 / Serpent a Monte Carlo Reactor Physics Burnup Calculation Code. Jaakko Leppänen

PSG2 / Serpent a Monte Carlo Reactor Physics Burnup Calculation Code. Jaakko Leppänen PSG2 / Serpent a Monte Carlo Reactor Physics Burnup Calculation Code Jaakko Leppänen Outline Background History The Serpent code: Neutron tracking Physics and interaction data Burnup calculation Output

More information

Breaking Through the Barriers to GPU Accelerated Monte Carlo Particle Transport

Breaking Through the Barriers to GPU Accelerated Monte Carlo Particle Transport Breaking Through the Barriers to GPU Accelerated Monte Carlo Particle Transport GTC 2018 Jeremy Sweezy Scientist Monte Carlo Methods, Codes and Applications Group 3/28/2018 Operated by Los Alamos National

More information

Click to edit Master title style

Click to edit Master title style Fun stuff with the built-in response matrix solver 7th International Serpent UGM, Gainesville, FL, Nov. 6 9, 2017 Jaakko Leppänen VTT Technical Research Center of Finland Click to edit Master title Outline

More information

High-Performance and Parallel Computing

High-Performance and Parallel Computing 9 High-Performance and Parallel Computing 9.1 Code optimization To use resources efficiently, the time saved through optimizing code has to be weighed against the human resources required to implement

More information

School of Computer and Information Science

School of Computer and Information Science School of Computer and Information Science CIS Research Placement Report Multiple threads in floating-point sort operations Name: Quang Do Date: 8/6/2012 Supervisor: Grant Wigley Abstract Despite the vast

More information

Parallel Computing Concepts. CSInParallel Project

Parallel Computing Concepts. CSInParallel Project Parallel Computing Concepts CSInParallel Project July 26, 2012 CONTENTS 1 Introduction 1 1.1 Motivation................................................ 1 1.2 Some pairs of terms...........................................

More information

SERPENT Cross Section Generation for the RBWR

SERPENT Cross Section Generation for the RBWR SERPENT Cross Section Generation for the RBWR Andrew Hall Thomas Downar 9/19/2012 Outline RBWR Motivation and Design Why use Serpent Cross Sections? Modeling the RBWR Generating an Equilibrium Cycle RBWR

More information

Computing Acceleration for a Pin-by-Pin Core Analysis Method Using a Three-Dimensional Direct Response Matrix Method

Computing Acceleration for a Pin-by-Pin Core Analysis Method Using a Three-Dimensional Direct Response Matrix Method Progress in NUCLEAR SCIENCE and TECHNOLOGY, Vol., pp.4-45 (0) ARTICLE Computing Acceleration for a Pin-by-Pin Core Analysis Method Using a Three-Dimensional Direct Response Matrix Method Taeshi MITSUYASU,

More information

Verification of the Hexagonal Ray Tracing Module and the CMFD Acceleration in ntracer

Verification of the Hexagonal Ray Tracing Module and the CMFD Acceleration in ntracer KNS 2017 Autumn Gyeongju Verification of the Hexagonal Ray Tracing Module and the CMFD Acceleration in ntracer October 27, 2017 Seongchan Kim, Changhyun Lim, Young Suk Ban and Han Gyu Joo * Reactor Physics

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing This document consists of two parts. The first part introduces basic concepts and issues that apply generally in discussions of parallel computing. The second part consists

More information

MIC Lab Parallel Computing on Stampede

MIC Lab Parallel Computing on Stampede MIC Lab Parallel Computing on Stampede Aaron Birkland and Steve Lantz Cornell Center for Advanced Computing June 11 & 18, 2013 1 Interactive Launching This exercise will walk through interactively launching

More information

Click to edit Master title style

Click to edit Master title style New features in Serpent 2 for fusion neutronics 5th International Serpent UGM, Knoxville, TN, Oct. 13-16, 2015 Jaakko Leppänen VTT Technical Research Center of Finland Click to edit Master title Outline

More information

2-D Reflector Modelling for VENUS-2 MOX Core Benchmark

2-D Reflector Modelling for VENUS-2 MOX Core Benchmark 2-D Reflector Modelling for VENUS-2 MOX Core Benchmark Dušan Ćalić ZEL-EN d.o.o. Vrbina 18 8270, Krsko, Slovenia dusan.calic@zel-en.si ABSTRACT The choice of the reflector model is an important issue in

More information

Parallelism. Parallel Hardware. Introduction to Computer Systems

Parallelism. Parallel Hardware. Introduction to Computer Systems Parallelism We have been discussing the abstractions and implementations that make up an individual computer system in considerable detail up to this point. Our model has been a largely sequential one,

More information

CSC 2515 Introduction to Machine Learning Assignment 2

CSC 2515 Introduction to Machine Learning Assignment 2 CSC 2515 Introduction to Machine Learning Assignment 2 Zhongtian Qiu(1002274530) Problem 1 See attached scan files for question 1. 2. Neural Network 2.1 Examine the statistics and plots of training error

More information

Status of the Serpent criticality safety validation package

Status of the Serpent criticality safety validation package VTT TECHNICAL RESEARCH CENTRE OF FINLAND LTD Status of the Serpent criticality safety validation package Serpent UGM 2017 Riku Tuominen and Ville Valtavirta, VTT Outline Criticality Safety Evaluation What

More information

Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures

Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures Rolf Rabenseifner rabenseifner@hlrs.de Gerhard Wellein gerhard.wellein@rrze.uni-erlangen.de University of Stuttgart

More information

arxiv: v1 [hep-lat] 12 Nov 2013

arxiv: v1 [hep-lat] 12 Nov 2013 Lattice Simulations using OpenACC compilers arxiv:13112719v1 [hep-lat] 12 Nov 2013 Indian Association for the Cultivation of Science, Kolkata E-mail: tppm@iacsresin OpenACC compilers allow one to use Graphics

More information

Accelerating Implicit LS-DYNA with GPU

Accelerating Implicit LS-DYNA with GPU Accelerating Implicit LS-DYNA with GPU Yih-Yih Lin Hewlett-Packard Company Abstract A major hindrance to the widespread use of Implicit LS-DYNA is its high compute cost. This paper will show modern GPU,

More information

CMS High Level Trigger Timing Measurements

CMS High Level Trigger Timing Measurements Journal of Physics: Conference Series PAPER OPEN ACCESS High Level Trigger Timing Measurements To cite this article: Clint Richardson 2015 J. Phys.: Conf. Ser. 664 082045 Related content - Recent Standard

More information

IMPROVEMENTS TO MONK & MCBEND ENABLING COUPLING & THE USE OF MONK CALCULATED ISOTOPIC COMPOSITIONS IN SHIELDING & CRITICALITY

IMPROVEMENTS TO MONK & MCBEND ENABLING COUPLING & THE USE OF MONK CALCULATED ISOTOPIC COMPOSITIONS IN SHIELDING & CRITICALITY IMPROVEMENTS TO MONK & MCBEND ENABLING COUPLING & THE USE OF MONK CALCULATED ISOTOPIC COMPOSITIONS IN SHIELDING & CRITICALITY N. Davies, M.J. Armishaw, S.D. Richards and G.P.Dobson Serco Technical Consulting

More information

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004 A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into

More information

Chapter 6: Examples 6.A Introduction

Chapter 6: Examples 6.A Introduction Chapter 6: Examples 6.A Introduction In Chapter 4, several approaches to the dual model regression problem were described and Chapter 5 provided expressions enabling one to compute the MSE of the mean

More information

FDS and Intel MPI. Verification Report. on the. FireNZE Linux IB Cluster

FDS and Intel MPI. Verification Report. on the. FireNZE Linux IB Cluster Consulting Fire Engineers 34 Satara Crescent Khandallah Wellington 6035 New Zealand FDS 6.7.0 and Intel MPI Verification Report on the FireNZE Linux IB Cluster Prepared by: FireNZE Dated: 11 August 2018

More information

Application of MCNP Code in Shielding Design for Radioactive Sources

Application of MCNP Code in Shielding Design for Radioactive Sources Application of MCNP Code in Shielding Design for Radioactive Sources Ibrahim A. Alrammah Abstract This paper presents three tasks: Task 1 explores: the detected number of as a function of polythene moderator

More information

Chapter 13 Strong Scaling

Chapter 13 Strong Scaling Chapter 13 Strong Scaling Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables

More information

Spring 2010 Research Report Judson Benton Locke. High-Statistics Geant4 Simulations

Spring 2010 Research Report Judson Benton Locke. High-Statistics Geant4 Simulations Florida Institute of Technology High Energy Physics Research Group Advisors: Marcus Hohlmann, Ph.D. Kondo Gnanvo, Ph.D. Note: During September 2010, it was found that the simulation data presented here

More information

Methodology for spatial homogenization in Serpent 2

Methodology for spatial homogenization in Serpent 2 Methodology for spatial homogenization in erpent 2 Jaakko Leppänen Memo 204/05/26 Background patial homogenization has been one of the main motivations for developing erpent since the beginning of the

More information

Status and development of multi-physics capabilities in Serpent 2

Status and development of multi-physics capabilities in Serpent 2 Status and development of multi-physics capabilities in Serpent 2 V. Valtavirta VTT Technical Research Centre of Finland ville.valtavirta@vtt.fi 2014 Serpent User Group Meeting Structure Click to of edit

More information

Geant4 Computing Performance Benchmarking and Monitoring

Geant4 Computing Performance Benchmarking and Monitoring Journal of Physics: Conference Series PAPER OPEN ACCESS Geant4 Computing Performance Benchmarking and Monitoring To cite this article: Andrea Dotti et al 2015 J. Phys.: Conf. Ser. 664 062021 View the article

More information

Improving Range Query Performance on Historic Web Page Data

Improving Range Query Performance on Historic Web Page Data Improving Range Query Performance on Historic Web Page Data Geng LI Lab of Computer Networks and Distributed Systems, Peking University Beijing, China ligeng@net.pku.edu.cn Bo Peng Lab of Computer Networks

More information

Ateles performance assessment report

Ateles performance assessment report Ateles performance assessment report Document Information Reference Number Author Contributor(s) Date Application Service Level Keywords AR-4, Version 0.1 Jose Gracia (USTUTT-HLRS) Christoph Niethammer,

More information

Parallelization of DQMC Simulations for Strongly Correlated Electron Systems

Parallelization of DQMC Simulations for Strongly Correlated Electron Systems Parallelization of DQMC Simulations for Strongly Correlated Electron Systems Che-Rung Lee Dept. of Computer Science National Tsing-Hua University Taiwan joint work with I-Hsin Chung (IBM Research), Zhaojun

More information

Multiphase flow metrology in oil and gas production: Case study of multiphase flow in horizontal tube

Multiphase flow metrology in oil and gas production: Case study of multiphase flow in horizontal tube Multiphase flow metrology in oil and gas production: Case study of multiphase flow in horizontal tube Deliverable 5.1.2 of Work Package WP5 (Creating Impact) Authors: Stanislav Knotek Czech Metrology Institute

More information

Introduction to Parallel Programming. Tuesday, April 17, 12

Introduction to Parallel Programming. Tuesday, April 17, 12 Introduction to Parallel Programming 1 Overview Parallel programming allows the user to use multiple cpus concurrently Reasons for parallel execution: shorten execution time by spreading the computational

More information

Math 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy.

Math 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy. Math 340 Fall 2014, Victor Matveev Binary system, round-off errors, loss of significance, and double precision accuracy. 1. Bits and the binary number system A bit is one digit in a binary representation

More information

Parallel Mesh Partitioning in Alya

Parallel Mesh Partitioning in Alya Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Parallel Mesh Partitioning in Alya A. Artigues a *** and G. Houzeaux a* a Barcelona Supercomputing Center ***antoni.artigues@bsc.es

More information

Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation

Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation 1 Cheng-Han Du* I-Hsin Chung** Weichung Wang* * I n s t i t u t e o f A p p l i e d M

More information

Lecture 7 Notes: 07 / 11. Reflection and refraction

Lecture 7 Notes: 07 / 11. Reflection and refraction Lecture 7 Notes: 07 / 11 Reflection and refraction When an electromagnetic wave, such as light, encounters the surface of a medium, some of it is reflected off the surface, while some crosses the boundary

More information

Debugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc.

Debugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc. Debugging CUDA Applications with Allinea DDT Ian Lumb Sr. Systems Engineer, Allinea Software Inc. ilumb@allinea.com GTC 2013, San Jose, March 20, 2013 Embracing GPUs GPUs a rival to traditional processors

More information

THE BENEFIT OF ANSA TOOLS IN THE DALLARA CFD PROCESS. Simona Invernizzi, Dallara Engineering, Italy,

THE BENEFIT OF ANSA TOOLS IN THE DALLARA CFD PROCESS. Simona Invernizzi, Dallara Engineering, Italy, THE BENEFIT OF ANSA TOOLS IN THE DALLARA CFD PROCESS Simona Invernizzi, Dallara Engineering, Italy, KEYWORDS automatic tools, batch mesh, DFM, morphing, ride height maps ABSTRACT In the last few years,

More information

The determination of the correct

The determination of the correct SPECIAL High-performance SECTION: H i gh-performance computing computing MARK NOBLE, Mines ParisTech PHILIPPE THIERRY, Intel CEDRIC TAILLANDIER, CGGVeritas (formerly Mines ParisTech) HENRI CALANDRA, Total

More information

Review of previous examinations TMA4280 Introduction to Supercomputing

Review of previous examinations TMA4280 Introduction to Supercomputing Review of previous examinations TMA4280 Introduction to Supercomputing NTNU, IMF April 24. 2017 1 Examination The examination is usually comprised of: one problem related to linear algebra operations with

More information

Using the Eulerian Multiphase Model for Granular Flow

Using the Eulerian Multiphase Model for Granular Flow Tutorial 21. Using the Eulerian Multiphase Model for Granular Flow Introduction Mixing tanks are used to maintain solid particles or droplets of heavy fluids in suspension. Mixing may be required to enhance

More information

SELECTION OF A MULTIVARIATE CALIBRATION METHOD

SELECTION OF A MULTIVARIATE CALIBRATION METHOD SELECTION OF A MULTIVARIATE CALIBRATION METHOD 0. Aim of this document Different types of multivariate calibration methods are available. The aim of this document is to help the user select the proper

More information

Importance Sampling Spherical Harmonics

Importance Sampling Spherical Harmonics Importance Sampling Spherical Harmonics Wojciech Jarosz 1,2 Nathan A. Carr 2 Henrik Wann Jensen 1 1 University of California, San Diego 2 Adobe Systems Incorporated April 2, 2009 Spherical Harmonic Sampling

More information

A N-dimensional Stochastic Control Algorithm for Electricity Asset Management on PC cluster and Blue Gene Supercomputer

A N-dimensional Stochastic Control Algorithm for Electricity Asset Management on PC cluster and Blue Gene Supercomputer A N-dimensional Stochastic Control Algorithm for Electricity Asset Management on PC cluster and Blue Gene Supercomputer Stéphane Vialle, Xavier Warin, Patrick Mercier To cite this version: Stéphane Vialle,

More information

Edge-Preserving Denoising for Segmentation in CT-Images

Edge-Preserving Denoising for Segmentation in CT-Images Edge-Preserving Denoising for Segmentation in CT-Images Eva Eibenberger, Anja Borsdorf, Andreas Wimmer, Joachim Hornegger Lehrstuhl für Mustererkennung, Friedrich-Alexander-Universität Erlangen-Nürnberg

More information

Assembly dynamics of microtubules at molecular resolution

Assembly dynamics of microtubules at molecular resolution Supplementary Information with: Assembly dynamics of microtubules at molecular resolution Jacob W.J. Kerssemakers 1,2, E. Laura Munteanu 1, Liedewij Laan 1, Tim L. Noetzel 2, Marcel E. Janson 1,3, and

More information

simulation framework for piecewise regular grids

simulation framework for piecewise regular grids WALBERLA, an ultra-scalable multiphysics simulation framework for piecewise regular grids ParCo 2015, Edinburgh September 3rd, 2015 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming January 14, 2015 www.cac.cornell.edu What is Parallel Programming? Theoretically a very simple concept Use more than one processor to complete a task Operationally

More information

Q: Which month has the lowest sale? Answer: Q:There are three consecutive months for which sale grow. What are they? Answer: Q: Which month

Q: Which month has the lowest sale? Answer: Q:There are three consecutive months for which sale grow. What are they? Answer: Q: Which month Lecture 1 Q: Which month has the lowest sale? Q:There are three consecutive months for which sale grow. What are they? Q: Which month experienced the biggest drop in sale? Q: Just above November there

More information

The Why and How of HPC-Cloud Hybrids with OpenStack

The Why and How of HPC-Cloud Hybrids with OpenStack The Why and How of HPC-Cloud Hybrids with OpenStack OpenStack Australia Day Melbourne June, 2017 Lev Lafayette, HPC Support and Training Officer, University of Melbourne lev.lafayette@unimelb.edu.au 1.0

More information

Investigation of Intel MIC for implementation of Fast Fourier Transform

Investigation of Intel MIC for implementation of Fast Fourier Transform Investigation of Intel MIC for implementation of Fast Fourier Transform Soren Goyal Department of Physics IIT Kanpur e-mail address: soren@iitk.ac.in The objective of the project was to run the code for

More information

v MODFLOW Stochastic Modeling, Parameter Randomization GMS 10.3 Tutorial

v MODFLOW Stochastic Modeling, Parameter Randomization GMS 10.3 Tutorial v. 10.3 GMS 10.3 Tutorial MODFLOW Stochastic Modeling, Parameter Randomization Run MODFLOW in Stochastic (Monte Carlo) Mode by Randomly Varying Parameters Objectives Learn how to develop a stochastic (Monte

More information

Investigations into Alternative Radiation Transport Codes for ITER Neutronics Analysis

Investigations into Alternative Radiation Transport Codes for ITER Neutronics Analysis CCFE-PR(17)10 Andrew Turner Investigations into Alternative Radiation Transport Codes for ITER Neutronics Analysis Enquiries about copyright and reproduction should in the first instance be addressed to

More information

v Prerequisite Tutorials Required Components Time

v Prerequisite Tutorials Required Components Time v. 10.0 GMS 10.0 Tutorial MODFLOW Stochastic Modeling, Parameter Randomization Run MODFLOW in Stochastic (Monte Carlo) Mode by Randomly Varying Parameters Objectives Learn how to develop a stochastic (Monte

More information

Bagging & System Combination for POS Tagging. Dan Jinguji Joshua T. Minor Ping Yu

Bagging & System Combination for POS Tagging. Dan Jinguji Joshua T. Minor Ping Yu Bagging & System Combination for POS Tagging Dan Jinguji Joshua T. Minor Ping Yu Bagging Bagging can gain substantially in accuracy The vital element is the instability of the learning algorithm Bagging

More information

Enemy Territory Traffic Analysis

Enemy Territory Traffic Analysis Enemy Territory Traffic Analysis Julie-Anne Bussiere *, Sebastian Zander Centre for Advanced Internet Architectures. Technical Report 00203A Swinburne University of Technology Melbourne, Australia julie-anne.bussiere@laposte.net,

More information

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Why serial is not enough Computing architectures Parallel paradigms Message Passing Interface How

More information

Performance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture. Alexander Berreth. Markus Bühler, Benedikt Anlauf

Performance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture. Alexander Berreth. Markus Bühler, Benedikt Anlauf PADC Anual Workshop 20 Performance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture Alexander Berreth RECOM Services GmbH, Stuttgart Markus Bühler, Benedikt Anlauf IBM Deutschland

More information

Designing for Performance. Patrick Happ Raul Feitosa

Designing for Performance. Patrick Happ Raul Feitosa Designing for Performance Patrick Happ Raul Feitosa Objective In this section we examine the most common approach to assessing processor and computer system performance W. Stallings Designing for Performance

More information

A recipe for fast(er) processing of netcdf files with Python and custom C modules

A recipe for fast(er) processing of netcdf files with Python and custom C modules A recipe for fast(er) processing of netcdf files with Python and custom C modules Ramneek Maan Singh a, Geoff Podger a, Jonathan Yu a a CSIRO Land and Water Flagship, GPO Box 1666, Canberra ACT 2601 Email:

More information

A FLEXIBLE COUPLING SCHEME FOR MONTE CARLO AND THERMAL-HYDRAULICS CODES

A FLEXIBLE COUPLING SCHEME FOR MONTE CARLO AND THERMAL-HYDRAULICS CODES International Conference on Mathematics and Computational Methods Applied to Nuclear Science and Engineering (M&C 2011) Rio de Janeiro, RJ, Brazil, May 8-12, 2011, on CD-ROM, Latin American Section (LAS)

More information

The p-sized partitioning algorithm for fast computation of factorials of numbers

The p-sized partitioning algorithm for fast computation of factorials of numbers J Supercomput (2006) 38:73 82 DOI 10.1007/s11227-006-7285-5 The p-sized partitioning algorithm for fast computation of factorials of numbers Ahmet Ugur Henry Thompson C Science + Business Media, LLC 2006

More information

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Agenda

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Agenda KFUPM HPC Workshop April 29-30 2015 Mohamed Mekias HPC Solutions Consultant Agenda 1 Agenda-Day 1 HPC Overview What is a cluster? Shared v.s. Distributed Parallel v.s. Massively Parallel Interconnects

More information

Position Paper: OpenMP scheduling on ARM big.little architecture

Position Paper: OpenMP scheduling on ARM big.little architecture Position Paper: OpenMP scheduling on ARM big.little architecture Anastasiia Butko, Louisa Bessad, David Novo, Florent Bruguier, Abdoulaye Gamatié, Gilles Sassatelli, Lionel Torres, and Michel Robert LIRMM

More information

Optimizing Data Locality for Iterative Matrix Solvers on CUDA

Optimizing Data Locality for Iterative Matrix Solvers on CUDA Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,

More information

Theoretical Investigations of Tomographic Methods used for Determination of the Integrity of Spent BWR Nuclear Fuel

Theoretical Investigations of Tomographic Methods used for Determination of the Integrity of Spent BWR Nuclear Fuel a UPPSALA UNIVERSITY Department of Radiation Sciences Box 535, S-751 1 Uppsala, Sweden http://www.tsl.uu.se/ Internal report ISV-6/97 August 1996 Theoretical Investigations of Tomographic Methods used

More information

State of the art of Monte Carlo technics for reliable activated waste evaluations

State of the art of Monte Carlo technics for reliable activated waste evaluations State of the art of Monte Carlo technics for reliable activated waste evaluations Matthieu CULIOLI a*, Nicolas CHAPOUTIER a, Samuel BARBIER a, Sylvain JANSKI b a AREVA NP, 10-12 rue Juliette Récamier,

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

Cover Page. The handle holds various files of this Leiden University dissertation.

Cover Page. The handle   holds various files of this Leiden University dissertation. Cover Page The handle http://hdl.handle.net/1887/22055 holds various files of this Leiden University dissertation. Author: Koch, Patrick Title: Efficient tuning in supervised machine learning Issue Date:

More information

On the Performance of MapReduce: A Stochastic Approach

On the Performance of MapReduce: A Stochastic Approach On the Performance of MapReduce: A Stochastic Approach Sarker Tanzir Ahmed and Dmitri Loguinov Internet Research Lab Department of Computer Science and Engineering Texas A&M University October 28, 2014

More information

Using Excel for Graphical Analysis of Data

Using Excel for Graphical Analysis of Data Using Excel for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters. Graphs are

More information

MPI and OpenMP Paradigms on Cluster of SMP Architectures: the Vacancy Tracking Algorithm for Multi-Dimensional Array Transposition

MPI and OpenMP Paradigms on Cluster of SMP Architectures: the Vacancy Tracking Algorithm for Multi-Dimensional Array Transposition MPI and OpenMP Paradigms on Cluster of SMP Architectures: the Vacancy Tracking Algorithm for Multi-Dimensional Array Transposition Yun He and Chris H.Q. Ding NERSC Division, Lawrence Berkeley National

More information

Parallel Performance Studies for a Clustering Algorithm

Parallel Performance Studies for a Clustering Algorithm Parallel Performance Studies for a Clustering Algorithm Robin V. Blasberg and Matthias K. Gobbert Naval Research Laboratory, Washington, D.C. Department of Mathematics and Statistics, University of Maryland,

More information

The Pennsylvania State University. The Graduate School. Department of Mechanical and Nuclear Engineering

The Pennsylvania State University. The Graduate School. Department of Mechanical and Nuclear Engineering The Pennsylvania State University The Graduate School Department of Mechanical and Nuclear Engineering IMPROVED REFLECTOR MODELING FOR LIGHT WATER REACTOR ANALYSIS A Thesis in Nuclear Engineering by David

More information

CURRICULUM UNIT MAP 1 ST QUARTER

CURRICULUM UNIT MAP 1 ST QUARTER 1 ST QUARTER Unit 1: Pre- Algebra Basics I WEEK 1-2 OBJECTIVES Apply properties for operations to positive rational numbers and integers Write products of like bases in exponential form Identify and use

More information

Accelerating GATE simulations

Accelerating GATE simulations GATE Simulations of Preclinical andclinical Scans in Emission Tomography, Transmission Tomography and Radiation Therapy Accelerating GATE simulations Parallel computing and GPU GATE Training, INSTN-Saclay,

More information

I. INTRODUCTION FACTORS RELATED TO PERFORMANCE ANALYSIS

I. INTRODUCTION FACTORS RELATED TO PERFORMANCE ANALYSIS Performance Analysis of Java NativeThread and NativePthread on Win32 Platform Bala Dhandayuthapani Veerasamy Research Scholar Manonmaniam Sundaranar University Tirunelveli, Tamilnadu, India dhanssoft@gmail.com

More information

6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS

6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS Chapter 6 Indexing Results 6. INTRODUCTION The generation of inverted indexes for text databases is a computationally intensive process that requires the exclusive use of processing resources for long

More information

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. CS 320 Ch. 18 Multicore Computers Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. Definitions: Hyper-threading Intel's proprietary simultaneous

More information

Detecting Polytomous Items That Have Drifted: Using Global Versus Step Difficulty 1,2. Xi Wang and Ronald K. Hambleton

Detecting Polytomous Items That Have Drifted: Using Global Versus Step Difficulty 1,2. Xi Wang and Ronald K. Hambleton Detecting Polytomous Items That Have Drifted: Using Global Versus Step Difficulty 1,2 Xi Wang and Ronald K. Hambleton University of Massachusetts Amherst Introduction When test forms are administered to

More information

Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse

Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse Thanh-Chung Dao 1 Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse Thanh-Chung Dao and Shigeru Chiba The University of Tokyo Thanh-Chung Dao 2 Supercomputers Expensive clusters Multi-core

More information

The Art of Parallel Processing

The Art of Parallel Processing The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a

More information

vsan 6.6 Performance Improvements First Published On: Last Updated On:

vsan 6.6 Performance Improvements First Published On: Last Updated On: vsan 6.6 Performance Improvements First Published On: 07-24-2017 Last Updated On: 07-28-2017 1 Table of Contents 1. Overview 1.1.Executive Summary 1.2.Introduction 2. vsan Testing Configuration and Conditions

More information

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR Detecting Missing and Spurious Edges in Large, Dense Networks Using Parallel Computing Samuel Coolidge, sam.r.coolidge@gmail.com Dan Simon, des480@nyu.edu Dennis Shasha, shasha@cims.nyu.edu Technical Report

More information

Intel MPI Library Conditional Reproducibility

Intel MPI Library Conditional Reproducibility 1 Intel MPI Library Conditional Reproducibility By Michael Steyer, Technical Consulting Engineer, Software and Services Group, Developer Products Division, Intel Corporation Introduction High performance

More information

CS 229: Machine Learning Final Report Identifying Driving Behavior from Data

CS 229: Machine Learning Final Report Identifying Driving Behavior from Data CS 9: Machine Learning Final Report Identifying Driving Behavior from Data Robert F. Karol Project Suggester: Danny Goodman from MetroMile December 3th 3 Problem Description For my project, I am looking

More information

error

error PARALLEL IMPLEMENTATION OF STOCHASTIC ITERATION ALGORITHMS Roel Mart nez, László Szirmay-Kalos, Mateu Sbert, Ali Mohamed Abbas Department of Informatics and Applied Mathematics, University of Girona Department

More information

Whitepaper Spain SEO Ranking Factors 2012

Whitepaper Spain SEO Ranking Factors 2012 Whitepaper Spain SEO Ranking Factors 2012 Authors: Marcus Tober, Sebastian Weber Searchmetrics GmbH Greifswalder Straße 212 10405 Berlin Phone: +49-30-3229535-0 Fax: +49-30-3229535-99 E-Mail: info@searchmetrics.com

More information

Bootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping

Bootstrapping Method for  14 June 2016 R. Russell Rhinehart. Bootstrapping Bootstrapping Method for www.r3eda.com 14 June 2016 R. Russell Rhinehart Bootstrapping This is extracted from the book, Nonlinear Regression Modeling for Engineering Applications: Modeling, Model Validation,

More information

arxiv: v1 [cs.dc] 2 Apr 2016

arxiv: v1 [cs.dc] 2 Apr 2016 Scalability Model Based on the Concept of Granularity Jan Kwiatkowski 1 and Lukasz P. Olech 2 arxiv:164.554v1 [cs.dc] 2 Apr 216 1 Department of Informatics, Faculty of Computer Science and Management,

More information

Exploiting Task-Parallelism on GPU Clusters via OmpSs and rcuda Virtualization

Exploiting Task-Parallelism on GPU Clusters via OmpSs and rcuda Virtualization Exploiting Task-Parallelism on Clusters via Adrián Castelló, Rafael Mayo, Judit Planas, Enrique S. Quintana-Ortí RePara 2015, August Helsinki, Finland Exploiting Task-Parallelism on Clusters via Power/energy/utilization

More information

Evaluation of RAPID for a UNF cask benchmark problem

Evaluation of RAPID for a UNF cask benchmark problem Evaluation of RAPID for a UNF cask benchmark problem Valerio Mascolino 1,a, Alireza Haghighat 1,b, and Nathan J. Roskoff 1,c 1 Nuclear Science & Engineering Lab (NSEL), Virginia Tech, 900 N Glebe Rd.,

More information

Hyper-Threading Influence on CPU Performance

Hyper-Threading Influence on CPU Performance João Martins* Jorge Gomes* Mario David* Gonçalo Borges* * LIP Laboratório de Instrumentação e Física Experimental de Particulas HePiX Spring

More information

arxiv: v1 [cs.dc] 27 Sep 2018

arxiv: v1 [cs.dc] 27 Sep 2018 Performance of MPI sends of non-contiguous data Victor Eijkhout arxiv:19.177v1 [cs.dc] 7 Sep 1 1 Abstract We present an experimental investigation of the performance of MPI derived datatypes. For messages

More information

Optimised corrections for finite-difference modelling in two dimensions

Optimised corrections for finite-difference modelling in two dimensions Optimized corrections for 2D FD modelling Optimised corrections for finite-difference modelling in two dimensions Peter M. Manning and Gary F. Margrave ABSTRACT Finite-difference two-dimensional correction

More information