FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D Jülich, Tel. (02461) Particle Simulations on Cray MPP Systems

Size: px
Start display at page:

Download "FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D Jülich, Tel. (02461) Particle Simulations on Cray MPP Systems"

Transcription

1 FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D Jülich, Tel. (02461) Interner Bericht Particle Simulations on Cray MPP Systems Christian M. Dury *, Renate Knecht, Gerald H. Ristow * FZJ-ZAM-IB-9714 September 1997 (letzte Änderung: ) (*) Fachbereich Physik, Philipps-Universität Marburg, Renthof 6, D Marburg, Germany Third European CRAY-SGI MPP Workshop, Paris,

2

3 Particle Simulations on Cray MPP Systems Christian M. Dury a, Renate Knecht b, Gerald H. Ristow a a Fachbereich Physik, Philipps-Universität Marburg, Renthof 6, D Marburg, Germany, dury@mailer.uni-marburg.de, ristow@physik.uni-marburg.de b Zentralinstitut für Angewandte Mathematik, Forschungszentrum Jülich GmbH, D Jülich, Germany, r.knecht@fz-juelich.de Abstract Particle simulations were among the first applications to be implemented on scalar computers over forty years ago, and have since played an important role in many science and engineering applications. Because of the inherent parallelism in all particle algorithms the advent of parallel computers has revolutionized this field: basically, the same set of calculations has to be performed for every particle in the system. At present, realistic simulations with a few million particles are possible using large, general-purpose, parallel computers. In this paper the parallel simulation of the size segregation of a binary mixture of granular materials in a half-filled three-dimensional rotating drum using the discrete element method with linear contact forces is investigated. Performance results of an implementation in Fortran 90 using MPI for data communication on CRAY T3D, CRAY T3E-600, and CRAY T3E-900 are presented. These have been determined with the help of the Cray tools MPP Apprentice and the performance analysis tool PAT as well as the message passing visualization tool VAMPIR developed at the Research Centre Jülich. 1 Introduction The study of granular materials has long been an active field of research, partly due to the many interesting physical phenomena which granular materials give rise to and partly because of their importance for industrial applications [1, 6]. Due to the advent of more powerful computers many scientists and engineers believe that some of the phenomena known in this field can be better understood through well planned computer simulations [2]. This belief rests on the premise that these phenomena are collective or emergent in nature, i.e., the constituent grains experience simple, well understood interactions with each other, but that unexpected behavior emerges due to the large numbers of grains involved. Hence, if the grain-grain interactions can be efficiently programmed so that a sufficiently large system can be simulated, then it should be possible to study phenomena which are still poorly understood. In the past such simulations were performed on vector computers. However, since most of the computational time is spent calculating the collision forces acting between the particles, this limits the consideration of interactions to those which can be vectorized. For many problems of scientific interest this limitation is not very restrictive, especially if one ignores factors like the price/performance ratio of the 1

4 computation. One of the promises of massively parallel computers consisting of scalar or super-scalar processors is the ability to perform cost-effective simulations on systems with more complicated, and more realistic interactions. Parallelization techniques for particle algorithms depend upon the range of the particle interactions and the number of particles. For short range interactions and simulations with more than a few thousand particles, the link-cell approach, a form of domain parallelization, is the most appropriate choice. This method divides the physical space into small cells and assigns each particle to a given cell. If the cell size is larger than the particles interaction radius, then only the neighboring cells need to be checked in order to find all possible collision partners. Parallelization is then accomplished by allocating all cells within a given physical domain to a given processor. For homogeneous systems and systems where fluctuations in the particle density are small, a static allocation of the domains to the processors is adequate. However, in the general case, statically allocated partitions lead to inadequately distributed computing loads. This problem can be overcome by mapping the domains to the processors dynamically [7]. Nevertheless the basic physical understanding of granular materials is far from being complete. One of the most intriguing properties is their tendency to segregate. It is observed in many industrial particle handling situations, such as transporting grains or mixing pharmaceutical pills. The rotating-cylinder geometry is an archetype of numerous devices used in industrial material processing where radial segregation can occur on short time scales and axial segregation is observed on larger time scales. The mechanism of the segregation process is based on the surface flow where small particles get stuck along the inclined plane more likely than the larger ones, hence accumulating near the center of the rotating drum (see figure 1). Many parameters are involved in the process of radial segregation and mixing, such as size, shape, mass, frictional forces, angular velocity, filling of the drum, etc. (a) (b) Figure 1: 2D drum: small particles are drawn as filled circles and large particles as open circles. (a) Snapshot of the drum right before the first avalanche. (b) Snapshot of the drum after rotating t = 60s with angular velocity ω = 1.0 Hz, i.e. after 9 rotations 2

5 2 Parallelization Distinct element simulations are based upon the use of distinct, individual elements each of which is free to move according to some given rules [3]. For granular materials, the most important interactions are the inelastic, soft-sphere collisions. For such short-range interactions, the link-cell algorithm is the most efficient programming technique [2] (see figure 2). This method starts by dividing the physical space into either square or cubic cells, depending on the underlying dimension of the physical space, with a side length R L. For polydisperse systems, i.e. systems with particles of varying diameter, one normally takes R L = R max +ɛ, where ɛ is a small positive number and R max is the diameter of the largest particle. For the monodisperse case, where all particles have the same diameter, it is more efficient to take R L = R max ɛ [5]. Figure 2: Link-cell algorithm (2-D) Once the space has been sectioned, all particles whose physical coordinates lie inside a given cell are placed into a linked-list associated with that cell (see figure 3). Then the problem of finding all particles colliding with a given particle is reduced to searching over all neighboring cells for the case R L > R max. (In practice one searches only over half the neighboring cells because the collisions are symmetric.) All interacting particle pairs can now be placed into a list which can then be efficiently processed in order to determine the forces acting on each particle due to the collisions. Usually one tries to find an ɛ such that the list needs only be recreated at most every 10-th time step. After the forces acting on each element are calculated, the Hamiltonian equations of motion are integrated to find the new position of each particle. Normally, a simple leap-frog integration method suffices, however, predictor-corrector schemes are also in widespread use [1]. Typically, the time spent integrating the equations of motion is negligible compared to the time needed for calculating the particle interactions. In this paper the parallel simulation of the size segregation of a binary mixture of granular material in a half-filled three-dimensional rotating drum using the distinct element method with linear contact forces is investigated. The rotation axis in this study is the x-axis and the cylinder is parallelized along this axis 3

6 CELL i,j Pointer Particle n 1 Particle Particle n2 n3 Pointer Pointer Pointer Figure 3: Linked list (see figure 4). Each processing element (PE) is owner of the data of the local particles and the data of the halo regions which contain the particles from the neighboring PEs. The particles in these cells are not updated rather their positions are used for the force calculations of the particles in the true cells. During the course of the simulation particles will migrate outside of the spatial region controlled by the processor on which they reside. Such particles need to be removed from the list in which they are registered and transmitted to the appropriate processor, where they are then registered. The performance measurements presented here have been performed without dynamic load balancing because in this application the flow of particles from one PE to another is approximately the same. Therefore no accumulation of particles on one PE can occur. rotation axis Figure 4: Parallelization of the 3D drum This approach has been implemented in Fortran 90 using MPI [8] for data communication on the systems CRAY T3D, CRAY T3E-600, and CRAY T3E-900 [9]. The numerical methods and the parameters used as well as a quantitative analysis of the segregation for different rotational velocities are described in [4]. 3 Performance Investigations Performance measurements have been accomplished on a CRAY T3D, on a CRAY T3E-600 with stream buffers disabled, and on a CRAY T3E-900 with stream buffers enabled and disabled, resp. External stream buffers in a CRAY T3E system are used to maximize local memory bandwidth, leading to a better performance for vector-like data references. The CRAY T3E-600 at the Research Centre Jülich is equipped with the older PE modules. A hardware design problem in the memory control chip may lead to stability problems of the system when the stream buffers are activated. Therefore they are disabled and may not be activated by user controlled environment variables. The characteristics of the Cray MPP systems used here are shown in table 1. 4

7 T3D T3E-600 T3E-900 Processor DEC Alpha EV4 EV5 EV5 Clock 150 MHz 300 MHz 450 MHz MFLOPS (peak performance) D torus clock 150 MHz 150 MHz 150 MHz 3D torus link bandwidth 300 MHz 500 MHz 500 MHz Primary cache 8 KB 8 KB 8 KB Secondary cache - 96 KB 96 KB Memory bandwidth 300 MB/s 1200 MB/s 1200 MB/s Table 1: Cray MPP systems characteristics On the CRAY T3E-600 the processor clock rate is doubled in comparison to the CRAY T3D. Furthermore, the CRAY T3E processor can perform 2 operations per clock period as opposed to 1 operation on a CRAY T3D. On the CRAY T3E applications can additionally benefit from the secondary cache which is not available on the CRAY T3D. The application s performance has been investigated using the Cray tools MPP Apprentice and the Performance Analysis Tool (PAT) as well as the message passing visualization tool VAMPIR (Visualization and Analysis of MPI Resources) developed at the Research Centre Jülich. MPP Apprentice and PAT can be used to identify the most time-consuming routines. MPP Apprentice assists the user in determining the performance characteristics of a parallel application on a CRAY T3D or T3E system and gives some indication of the causes of the observed behavior. Due to the large overhead induced by the MPP Apprentice run-time library, the given timings are only an indication of the real execution times. Moreover, the given MFLOPS or integer operations cannot be used to measure the real performance. To provide more thorough information the PAT performance analysis tools is available on CRAY T3E systems. PAT uses hardware performance counters and the profil(2) system call on UNICOS/mk systems. It provides a fast, low-overhead method for estimating the amount of time consumed in procedures, determining load balance across PEs, generating and viewing trace files, timing individual calls to routines, and displaying hardware performance counter information. A program that gathers PAT performance data runs much faster than a program instrumented to collect performance data for MPP Apprentice. On the average, a program instrumented to collect data for MPP Apprentice runs three times slower compared to the uninstrumented program. On the other hand, VAMPIR provides detailed information on the message passing communication and the load balancing on the PEs. VAMPIR translates a trace file generated on a Cray MPP system at runtime into a variety of graphical views, e.g. state diagrams, activity charts, time-line displays (see figure 5), and statistics. Time-line displays are helpful to get an overview of the load balancing of the program. Colors are used to represent different kinds of activities. In this example MPI routines are shown in blue whereas the computation part is shown in green. Zooming is possible to analyze the program on any level of detail. Each message sent from one PE to another can be identified. The execution time for one iteration in this example is about 24 ms and can be determined using a VAMPIR popup-menu. To generate trace data in 5

8 the current version the source code has to be instrumented with calls to a run-time library. A future version of PAT will be capable of an object code instrumentation which makes the usage of VAMPIR independent of preprocessors for special programming languages. Figure 5: Time-line display showing one iteration out of 200 with particles on 16 PEs of a CRAY T3E-900 (stream buffers activated) Table 2 shows the measured execution times of the application without I/O. 200 iterations of the simulation were performed for a drum with particles on 16 PEs. The performance gain of CRAY T3E-900 over CRAY T3E-600 can be 50 % at maximum because of the higher clock rate. Moreover, the stream buffer usage may additionally speed up the program. The upper window in figure 6 shows the sum of the execution times of all user routines on 16 PEs of a CRAY T3D in comparison to a CRAY T3E-600 without stream buffer usage. As mentioned above the most time-consuming routine is the computation of the particle-particle interactions. This part of the program is about 3 times faster on the CRAY T3E-600. The window below displays the sum of the MPI 6

9 T3D T3E-600 streams off T3E-900 streams off T3E-900 streams on Execution time s 5.86 s 5.19 s 4.39 s Speedup in relation to CRAY T3D Speedup in relation to CRAY T3E-600 Table 2: 200 iterations with particles on 16 PEs routines showing a considerable amount of synchronization overhead (MPI barrier). In figure 7 the effect of the stream buffer usage can be seen. The overhead induced by MPI communication routines is about the same on both CRAY T3E systems. Only the amount of barrier synchronization is reduced by 50 % on the CRAY T3E-900. The most time-consuming barrier synchronization is at the beginning of the program. PE 0 has to read the input data and broadcast the appropriate subsets to the other PEs, which have to wait that PE 0 has finished the preparatory work. The performance counters of PAT give about 90 to 100 million integer operations per PE for a large system of particles and 50 iterations for the whole program including I/O on 32 PEs of a CRAY T3E-600, which is about 16 % of the theoretical peak performance. The measured wall-clock time is about 2.5 minutes for the iterations of the simulation. 4 Summary and Discussion We have studied the performance of a parallel algorithm simulating the size segregation of a binary mixture of granular materials in a half-filled three-dimensional rotating drum. The algorithm has been implemented on the Cray MPP systems CRAY T3D, CRAY T3E-600, and CRAY T3E-900. The measurements on the CRAY T3E-600 have been performed without stream buffer usage whereas on the CRAY T3E-900 the effect of the stream buffers has been considered as well. The CRAY T3E-600 is about 2.6 times faster than the CRAY T3D for the application described in this paper. Using a CRAY T3E-900 without stream buffers a speedup of 11 % can be achieved in comparison to the CRAY T3E-600. Furthermore, for this application the stream buffer usage gives an additional speedup of 15 % as opposed to a CRAY T3E-900 with stream buffers not activated. The performance improvement is about 25 % in comparison to a CRAY T3E-600 with stream buffers disabled. These performance measurements confirm the results which have been observed for other applications and benchmark tests on Cray MPP systems. Acknowledgements The authors are grateful to the University of Rostock and the Konrad-Zuse-Zentrum für Informationstechnik Berlin for granting access to their CRAY T3E-900 and CRAY T3D, resp. 7

10 Figure 6: Timings for calculation and MPI overhead on CRAY T3D (upper bars) and CRAY T3E-600 (lower bars) without stream buffer usage summed up on 16 PEs 8

11 Figure 7: Timings for calculation and MPI overhead on CRAY T3E-900 with stream buffers disabled (upper bars) and CRAY T3E-900 with stream buffers enabled (lower bars) summed up on 16 PEs 9

12 References 1. M. P. Allen and D. J. Tildesley, Computer Simulations of Liquids, Clarendon Press, Oxford, D.M. Beazley and P.S. Lomdahl, Message-Passing Multi-Cell Molecular Dynamics on the Connection Machine 5, Parallel Computing 20, 2 (1994) P.A. Cundall, O.D.L. Strack, A discrete numerical model for granular assemblies, Géotechnique 29, 1 (1979) C.M. Dury and G.H. Ristow, Radial Segregation in a Two-Dimensional Rotating Drum, Journal de Physique I France 7 (1997) W. Form, N. Ito, and G.A. Kohring, Vectorized and Parallelized Algorithms for Multi-Million Particle MD-Simulations, Int. J. Mod. Phys. C4 (1993) R.W. Hockney and J.W. Eastwood, Computer Simulation Using Particles, Adam Hilger, Bristol, R. Knecht and G.A. Kohring, Dynamic Load Balancing for the Simulation of Granular Materials, Proceedings of ICS 95, Barcelona, 3-7 July 1995, Message Passing Interface Forum, MPI: A Message-Passing Interface Standard, T3E overview, obtainable from: 10

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware

More information

FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D Jülich, Tel. (02461)

FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D Jülich, Tel. (02461) FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D-52425 Jülich, Tel. (02461) 61-6402 Interner Bericht Performance Characteristics for OpenMP Constructs on Different Parallel Computer

More information

S7260: Microswimmers on Speed: Simulating Spheroidal Squirmers on GPUs

S7260: Microswimmers on Speed: Simulating Spheroidal Squirmers on GPUs S7260: Microswimmers on Speed: Simulating Spheroidal Squirmers on GPUs Elmar Westphal - Forschungszentrum Jülich GmbH Spheroids Spheroid: A volume formed by rotating an ellipse around one of its axes Two

More information

IMD: A SOFTWARE PACKAGE FOR MOLECULAR DYNAMICS STUDIES ON PARALLEL COMPUTERS

IMD: A SOFTWARE PACKAGE FOR MOLECULAR DYNAMICS STUDIES ON PARALLEL COMPUTERS International Journal of Modern Physics C, Vol. 8, No. 5 (1997) 1131 1140 c World Scientific Publishing Company IMD: A SOFTWARE PACKAGE FOR MOLECULAR DYNAMICS STUDIES ON PARALLEL COMPUTERS J. STADLER,

More information

Cost-Effective Parallel Computational Electromagnetic Modeling

Cost-Effective Parallel Computational Electromagnetic Modeling Cost-Effective Parallel Computational Electromagnetic Modeling, Tom Cwik {Daniel.S.Katz, cwik}@jpl.nasa.gov Beowulf System at PL (Hyglac) l 16 Pentium Pro PCs, each with 2.5 Gbyte disk, 128 Mbyte memory,

More information

LINPACK Benchmark. on the Fujitsu AP The LINPACK Benchmark. Assumptions. A popular benchmark for floating-point performance. Richard P.

LINPACK Benchmark. on the Fujitsu AP The LINPACK Benchmark. Assumptions. A popular benchmark for floating-point performance. Richard P. 1 2 The LINPACK Benchmark on the Fujitsu AP 1000 Richard P. Brent Computer Sciences Laboratory The LINPACK Benchmark A popular benchmark for floating-point performance. Involves the solution of a nonsingular

More information

A fast model for the simulation of non-round particles

A fast model for the simulation of non-round particles Granular Matter 1, 9 14 c Springer-Verlag 1998 A fast model for the simulation of non-round particles Alexander V. Potapov, Charles S. Campbell Abstract This paper describes a new, computationally efficient

More information

Portland State University ECE 588/688. Cray-1 and Cray T3E

Portland State University ECE 588/688. Cray-1 and Cray T3E Portland State University ECE 588/688 Cray-1 and Cray T3E Copyright by Alaa Alameldeen 2018 Cray-1 A successful Vector processor from the 1970s Vector instructions are examples of SIMD Contains vector

More information

Parallelisation of Surface-Related Multiple Elimination

Parallelisation of Surface-Related Multiple Elimination Parallelisation of Surface-Related Multiple Elimination G. M. van Waveren High Performance Computing Centre, Groningen, The Netherlands and I.M. Godfrey Stern Computing Systems, Lyon,

More information

Performance Analysis of MPI Programs with Vampir and Vampirtrace Bernd Mohr

Performance Analysis of MPI Programs with Vampir and Vampirtrace Bernd Mohr Performance Analysis of MPI Programs with Vampir and Vampirtrace Bernd Mohr Research Centre Juelich (FZJ) John von Neumann Institute of Computing (NIC) Central Institute for Applied Mathematics (ZAM) 52425

More information

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of

More information

Comparing Gang Scheduling with Dynamic Space Sharing on Symmetric Multiprocessors Using Automatic Self-Allocating Threads (ASAT)

Comparing Gang Scheduling with Dynamic Space Sharing on Symmetric Multiprocessors Using Automatic Self-Allocating Threads (ASAT) Comparing Scheduling with Dynamic Space Sharing on Symmetric Multiprocessors Using Automatic Self-Allocating Threads (ASAT) Abstract Charles Severance Michigan State University East Lansing, Michigan,

More information

simulation framework for piecewise regular grids

simulation framework for piecewise regular grids WALBERLA, an ultra-scalable multiphysics simulation framework for piecewise regular grids ParCo 2015, Edinburgh September 3rd, 2015 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler

More information

Healthy Buildings 2017 Europe July 2-5, 2017, Lublin, Poland

Healthy Buildings 2017 Europe July 2-5, 2017, Lublin, Poland Healthy Buildings 2017 Europe July 2-5, 2017, Lublin, Poland Paper ID 0122 ISBN: 978-83-7947-232-1 Numerical Investigation of Transport and Deposition of Liquid Aerosol Particles in Indoor Environments

More information

Dynamic load balancing in OSIRIS

Dynamic load balancing in OSIRIS Dynamic load balancing in OSIRIS R. A. Fonseca 1,2 1 GoLP/IPFN, Instituto Superior Técnico, Lisboa, Portugal 2 DCTI, ISCTE-Instituto Universitário de Lisboa, Portugal Maintaining parallel load balance

More information

Parallel Summation of Inter-Particle Forces in SPH

Parallel Summation of Inter-Particle Forces in SPH Parallel Summation of Inter-Particle Forces in SPH Fifth International Workshop on Meshfree Methods for Partial Differential Equations 17.-19. August 2009 Bonn Overview Smoothed particle hydrodynamics

More information

Performance Study of the MPI and MPI-CH Communication Libraries on the IBM SP

Performance Study of the MPI and MPI-CH Communication Libraries on the IBM SP Performance Study of the MPI and MPI-CH Communication Libraries on the IBM SP Ewa Deelman and Rajive Bagrodia UCLA Computer Science Department deelman@cs.ucla.edu, rajive@cs.ucla.edu http://pcl.cs.ucla.edu

More information

BlueGene/L (No. 4 in the Latest Top500 List)

BlueGene/L (No. 4 in the Latest Top500 List) BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside

More information

High-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers

High-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers High-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers July 14, 1997 J Daniel S. Katz (Daniel.S.Katz@jpl.nasa.gov) Jet Propulsion Laboratory California Institute of Technology

More information

Scalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany

Scalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Scalasca support for Intel Xeon Phi Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Overview Scalasca performance analysis toolset support for MPI & OpenMP

More information

Modeling Evaporating Liquid Spray

Modeling Evaporating Liquid Spray Tutorial 17. Modeling Evaporating Liquid Spray Introduction In this tutorial, the air-blast atomizer model in ANSYS FLUENT is used to predict the behavior of an evaporating methanol spray. Initially, the

More information

The determination of the correct

The determination of the correct SPECIAL High-performance SECTION: H i gh-performance computing computing MARK NOBLE, Mines ParisTech PHILIPPE THIERRY, Intel CEDRIC TAILLANDIER, CGGVeritas (formerly Mines ParisTech) HENRI CALANDRA, Total

More information

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency 1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming

More information

McNair Scholars Research Journal

McNair Scholars Research Journal McNair Scholars Research Journal Volume 2 Article 1 2015 Benchmarking of Computational Models against Experimental Data for Velocity Profile Effects on CFD Analysis of Adiabatic Film-Cooling Effectiveness

More information

Optimization of MPI Applications Rolf Rabenseifner

Optimization of MPI Applications Rolf Rabenseifner Optimization of MPI Applications Rolf Rabenseifner University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Optimization of MPI Applications Slide 1 Optimization and Standardization

More information

Modeling Evaporating Liquid Spray

Modeling Evaporating Liquid Spray Tutorial 16. Modeling Evaporating Liquid Spray Introduction In this tutorial, FLUENT s air-blast atomizer model is used to predict the behavior of an evaporating methanol spray. Initially, the air flow

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

FLUENT Secondary flow in a teacup Author: John M. Cimbala, Penn State University Latest revision: 26 January 2016

FLUENT Secondary flow in a teacup Author: John M. Cimbala, Penn State University Latest revision: 26 January 2016 FLUENT Secondary flow in a teacup Author: John M. Cimbala, Penn State University Latest revision: 26 January 2016 Note: These instructions are based on an older version of FLUENT, and some of the instructions

More information

Blue Waters I/O Performance

Blue Waters I/O Performance Blue Waters I/O Performance Mark Swan Performance Group Cray Inc. Saint Paul, Minnesota, USA mswan@cray.com Doug Petesch Performance Group Cray Inc. Saint Paul, Minnesota, USA dpetesch@cray.com Abstract

More information

Monte Carlo Method on Parallel Computing. Jongsoon Kim

Monte Carlo Method on Parallel Computing. Jongsoon Kim Monte Carlo Method on Parallel Computing Jongsoon Kim Introduction Monte Carlo methods Utilize random numbers to perform a statistical simulation of a physical problem Extremely time-consuming Inherently

More information

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs C.-C. Su a, C.-W. Hsieh b, M. R. Smith b, M. C. Jermy c and J.-S. Wu a a Department of Mechanical Engineering, National Chiao Tung

More information

FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D Jülich, Tel. (02461)

FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D Jülich, Tel. (02461) FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D-52425 Jülich, Tel. (02461) 61-6402 Interner Bericht Wrappers for Tracing Collective Communication Functions with PAT Bart Theelen

More information

On the scalability of tracing mechanisms 1

On the scalability of tracing mechanisms 1 On the scalability of tracing mechanisms 1 Felix Freitag, Jordi Caubet, Jesus Labarta Departament d Arquitectura de Computadors (DAC) European Center for Parallelism of Barcelona (CEPBA) Universitat Politècnica

More information

FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D Jülich, Tel. (02461) AIX Compiler Update

FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D Jülich, Tel. (02461) AIX Compiler Update FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D-52425 Jülich, Tel. (02461) 61-6402 Interner Bericht AIX Compiler Update Klaus Wolkersdorfer FZJ-ZAM-IB-2001-17 September 2001 (letzte

More information

CRAY T3E at the Research Centre Juelich - Delivering GigaFlops Around the Clock

CRAY T3E at the Research Centre Juelich - Delivering GigaFlops Around the Clock CRAY T3E at the Research Centre Juelich - Delivering GigaFlops Around the Clock Jutta Docter Zentralinstitut fuer Angewandte Mathematik Forschungszentrum Juelich GmbH, Juelich, Germany ABSTRACT: Scientists

More information

Microwell Mixing with Surface Tension

Microwell Mixing with Surface Tension Microwell Mixing with Surface Tension Nick Cox Supervised by Professor Bruce Finlayson University of Washington Department of Chemical Engineering June 6, 2007 Abstract For many applications in the pharmaceutical

More information

Red Storm / Cray XT4: A Superior Architecture for Scalability

Red Storm / Cray XT4: A Superior Architecture for Scalability Red Storm / Cray XT4: A Superior Architecture for Scalability Mahesh Rajan, Doug Doerfler, Courtenay Vaughan Sandia National Laboratories, Albuquerque, NM Cray User Group Atlanta, GA; May 4-9, 2009 Sandia

More information

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads... OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.

More information

Performance potential for simulating spin models on GPU

Performance potential for simulating spin models on GPU Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational

More information

Performance Metrics of a Parallel Three Dimensional Two-Phase DSMC Method for Particle-Laden Flows

Performance Metrics of a Parallel Three Dimensional Two-Phase DSMC Method for Particle-Laden Flows Performance Metrics of a Parallel Three Dimensional Two-Phase DSMC Method for Particle-Laden Flows Benzi John* and M. Damodaran** Division of Thermal and Fluids Engineering, School of Mechanical and Aerospace

More information

ISC 09 Poster Abstract : I/O Performance Analysis for the Petascale Simulation Code FLASH

ISC 09 Poster Abstract : I/O Performance Analysis for the Petascale Simulation Code FLASH ISC 09 Poster Abstract : I/O Performance Analysis for the Petascale Simulation Code FLASH Heike Jagode, Shirley Moore, Dan Terpstra, Jack Dongarra The University of Tennessee, USA [jagode shirley terpstra

More information

Numerical Simulations of Granular Materials Flow around Obstacles: The role of the interstitial gas

Numerical Simulations of Granular Materials Flow around Obstacles: The role of the interstitial gas Numerical Simulations of Granular Materials Flow around Obstacles: The role of the interstitial gas Avi Levy, Dept. Mech. Eng., Ben Gurion University, Beer Sheva, Israel. Mohamed Sayed, CHC, National Research

More information

Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method

Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method G. Wellein, T. Zeiser, G. Hager HPC Services Regional Computing Center A. Nitsure, K. Iglberger, U. Rüde Chair for System

More information

The Five Rooms Project

The Five Rooms Project The Five Rooms Project The Assignment If an artist is given the task of graphically designing a surface, then he is also left to decide which creative processes will be active and which criteria will then

More information

LATTICE-BOLTZMANN METHOD FOR THE SIMULATION OF LAMINAR MIXERS

LATTICE-BOLTZMANN METHOD FOR THE SIMULATION OF LAMINAR MIXERS 14 th European Conference on Mixing Warszawa, 10-13 September 2012 LATTICE-BOLTZMANN METHOD FOR THE SIMULATION OF LAMINAR MIXERS Felix Muggli a, Laurent Chatagny a, Jonas Lätt b a Sulzer Markets & Technology

More information

Non-Newtonian Transitional Flow in an Eccentric Annulus

Non-Newtonian Transitional Flow in an Eccentric Annulus Tutorial 8. Non-Newtonian Transitional Flow in an Eccentric Annulus Introduction The purpose of this tutorial is to illustrate the setup and solution of a 3D, turbulent flow of a non-newtonian fluid. Turbulent

More information

12 m. 30 m. The Volume of a sphere is 36 cubic units. Find the length of the radius.

12 m. 30 m. The Volume of a sphere is 36 cubic units. Find the length of the radius. NAME DATE PER. REVIEW #18: SPHERES, COMPOSITE FIGURES, & CHANGING DIMENSIONS PART 1: SURFACE AREA & VOLUME OF SPHERES Find the measure(s) indicated. Answers to even numbered problems should be rounded

More information

CMAQ PARALLEL PERFORMANCE WITH MPI AND OPENMP**

CMAQ PARALLEL PERFORMANCE WITH MPI AND OPENMP** CMAQ 5.2.1 PARALLEL PERFORMANCE WITH MPI AND OPENMP** George Delic* HiPERiSM Consulting, LLC, P.O. Box 569, Chapel Hill, NC 27514, USA 1. INTRODUCTION This presentation reports on implementation of the

More information

Chapter 4. Clustering Core Atoms by Location

Chapter 4. Clustering Core Atoms by Location Chapter 4. Clustering Core Atoms by Location In this chapter, a process for sampling core atoms in space is developed, so that the analytic techniques in section 3C can be applied to local collections

More information

Partitioning Effects on MPI LS-DYNA Performance

Partitioning Effects on MPI LS-DYNA Performance Partitioning Effects on MPI LS-DYNA Performance Jeffrey G. Zais IBM 138 Third Street Hudson, WI 5416-1225 zais@us.ibm.com Abbreviations: MPI message-passing interface RISC - reduced instruction set computing

More information

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes: BIT 325 PARALLEL PROCESSING ASSESSMENT CA 40% TESTS 30% PRESENTATIONS 10% EXAM 60% CLASS TIME TABLE SYLLUBUS & RECOMMENDED BOOKS Parallel processing Overview Clarification of parallel machines Some General

More information

NAMD Serial and Parallel Performance

NAMD Serial and Parallel Performance NAMD Serial and Parallel Performance Jim Phillips Theoretical Biophysics Group Serial performance basics Main factors affecting serial performance: Molecular system size and composition. Cutoff distance

More information

Small Height Duct Design for 17 Multicopter Fan Considering Its Interference on Quad-copter

Small Height Duct Design for 17 Multicopter Fan Considering Its Interference on Quad-copter Small Height Duct Design for 17 Multicopter Fan Considering Its Interference on Quad-copter Stremousov K.*, Arkhipov M.* **, Serokhvostov S.* ** * Moscow Institute of Physics and Technology, Department

More information

MATHEMATICAL MODELING OF SILICON SINGLE CRYSTAL INDUSTRIAL GROWTH

MATHEMATICAL MODELING OF SILICON SINGLE CRYSTAL INDUSTRIAL GROWTH MATHEMATICAL MODELING OF SILICON SINGLE CRYSTAL INDUSTRIAL GROWTH Dr. Andris Muižnieks, Dr. Andis Rudevičs, Dr. Armands Krauze, BSc. Vadims Suškovs, BSc. Kirils Surovovs, BSc. Kārlis Janisels 1. Introduction

More information

Simulation of Flow Development in a Pipe

Simulation of Flow Development in a Pipe Tutorial 4. Simulation of Flow Development in a Pipe Introduction The purpose of this tutorial is to illustrate the setup and solution of a 3D turbulent fluid flow in a pipe. The pipe networks are common

More information

Influence of mesh quality and density on numerical calculation of heat exchanger with undulation in herringbone pattern

Influence of mesh quality and density on numerical calculation of heat exchanger with undulation in herringbone pattern Influence of mesh quality and density on numerical calculation of heat exchanger with undulation in herringbone pattern Václav Dvořák, Jan Novosád Abstract Research of devices for heat recovery is currently

More information

CFD MODELING FOR PNEUMATIC CONVEYING

CFD MODELING FOR PNEUMATIC CONVEYING CFD MODELING FOR PNEUMATIC CONVEYING Arvind Kumar 1, D.R. Kaushal 2, Navneet Kumar 3 1 Associate Professor YMCAUST, Faridabad 2 Associate Professor, IIT, Delhi 3 Research Scholar IIT, Delhi e-mail: arvindeem@yahoo.co.in

More information

Research Collection. Localisation of Acoustic Emission in Reinforced Concrete using Heterogeneous Velocity Models. Conference Paper.

Research Collection. Localisation of Acoustic Emission in Reinforced Concrete using Heterogeneous Velocity Models. Conference Paper. Research Collection Conference Paper Localisation of Acoustic Emission in Reinforced Concrete using Heterogeneous Velocity Models Author(s): Gollob, Stephan; Vogel, Thomas Publication Date: 2014 Permanent

More information

APS Sixth Grade Math District Benchmark Assessment NM Math Standards Alignment

APS Sixth Grade Math District Benchmark Assessment NM Math Standards Alignment SIXTH GRADE NM STANDARDS Strand: NUMBER AND OPERATIONS Standard: Students will understand numerical concepts and mathematical operations. 5-8 Benchmark N.: Understand numbers, ways of representing numbers,

More information

NIA CFD Futures Conference Hampton, VA; August 2012

NIA CFD Futures Conference Hampton, VA; August 2012 Petascale Computing and Similarity Scaling in Turbulence P. K. Yeung Schools of AE, CSE, ME Georgia Tech pk.yeung@ae.gatech.edu NIA CFD Futures Conference Hampton, VA; August 2012 10 2 10 1 10 4 10 5 Supported

More information

Performance Prediction for Parallel Local Weather Forecast Programs

Performance Prediction for Parallel Local Weather Forecast Programs Performance Prediction for Parallel Local Weather Forecast Programs W. Joppich and H. Mierendorff GMD German National Research Center for Information Technology Institute for Algorithms and Scientific

More information

SIMULATION OF FLOW FIELD AROUND AND INSIDE SCOUR PROTECTION WITH PHYSICAL AND REALISTIC PARTICLE CONFIGURATIONS

SIMULATION OF FLOW FIELD AROUND AND INSIDE SCOUR PROTECTION WITH PHYSICAL AND REALISTIC PARTICLE CONFIGURATIONS XIX International Conference on Water Resources CMWR 2012 University of Illinois at Urbana-Champaign June 17-22, 2012 SIMULATION OF FLOW FIELD AROUND AND INSIDE SCOUR PROTECTION WITH PHYSICAL AND REALISTIC

More information

Benchmark 1.a Investigate and Understand Designated Lab Techniques The student will investigate and understand designated lab techniques.

Benchmark 1.a Investigate and Understand Designated Lab Techniques The student will investigate and understand designated lab techniques. I. Course Title Parallel Computing 2 II. Course Description Students study parallel programming and visualization in a variety of contexts with an emphasis on underlying and experimental technologies.

More information

Single Pass Connected Components Analysis

Single Pass Connected Components Analysis D. G. Bailey, C. T. Johnston, Single Pass Connected Components Analysis, Proceedings of Image and Vision Computing New Zealand 007, pp. 8 87, Hamilton, New Zealand, December 007. Single Pass Connected

More information

Parallel Computer Architecture II

Parallel Computer Architecture II Parallel Computer Architecture II Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-692 Heidelberg phone: 622/54-8264 email: Stefan.Lang@iwr.uni-heidelberg.de

More information

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can

More information

Peta-Scale Simulations with the HPC Software Framework walberla:

Peta-Scale Simulations with the HPC Software Framework walberla: Peta-Scale Simulations with the HPC Software Framework walberla: Massively Parallel AMR for the Lattice Boltzmann Method SIAM PP 2016, Paris April 15, 2016 Florian Schornbaum, Christian Godenschwager,

More information

Using a Single Rotating Reference Frame

Using a Single Rotating Reference Frame Tutorial 9. Using a Single Rotating Reference Frame Introduction This tutorial considers the flow within a 2D, axisymmetric, co-rotating disk cavity system. Understanding the behavior of such flows is

More information

Kinematics of Machines Prof. A. K. Mallik Department of Mechanical Engineering Indian Institute of Technology, Kanpur. Module 10 Lecture 1

Kinematics of Machines Prof. A. K. Mallik Department of Mechanical Engineering Indian Institute of Technology, Kanpur. Module 10 Lecture 1 Kinematics of Machines Prof. A. K. Mallik Department of Mechanical Engineering Indian Institute of Technology, Kanpur Module 10 Lecture 1 So far, in this course we have discussed planar linkages, which

More information

A geometric algorithm for discrete element method to generate composite materials

A geometric algorithm for discrete element method to generate composite materials A geometric algorithm for discrete element method to generate composite materials J.F. Jerier, F.V. Donzé, D. Imbault & P. Doremus Laboratoire Sols, Solides, Structures, Risques Grenoble, France Jerier@hmg.inpg.fr

More information

UCLA UCLA Previously Published Works

UCLA UCLA Previously Published Works UCLA UCLA Previously Published Works Title Parallel Markov chain Monte Carlo simulations Permalink https://escholarship.org/uc/item/4vh518kv Authors Ren, Ruichao Orkoulas, G. Publication Date 2007-06-01

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( ) Systems Group Department of Computer Science ETH Zürich Lecture 15: Caches and Optimization Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Last time Program

More information

Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18

Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Accelerating PageRank using Partition-Centric Processing Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Outline Introduction Partition-centric Processing Methodology Analytical Evaluation

More information

Constrained Diffusion Limited Aggregation in 3 Dimensions

Constrained Diffusion Limited Aggregation in 3 Dimensions Constrained Diffusion Limited Aggregation in 3 Dimensions Paul Bourke Swinburne University of Technology P. O. Box 218, Hawthorn Melbourne, Vic 3122, Australia. Email: pdb@swin.edu.au Abstract Diffusion

More information

ATLAS NOTE. December 4, ATLAS offline reconstruction timing improvements for run-2. The ATLAS Collaboration. Abstract

ATLAS NOTE. December 4, ATLAS offline reconstruction timing improvements for run-2. The ATLAS Collaboration. Abstract ATLAS NOTE December 4, 2014 ATLAS offline reconstruction timing improvements for run-2 The ATLAS Collaboration Abstract ATL-SOFT-PUB-2014-004 04/12/2014 From 2013 to 2014 the LHC underwent an upgrade to

More information

Age Related Maths Expectations

Age Related Maths Expectations Step 1 Times Tables Addition Subtraction Multiplication Division Fractions Decimals Percentage & I can count in 2 s, 5 s and 10 s from 0 to 100 I can add in 1 s using practical resources I can add in 1

More information

Software and Performance Engineering for numerical codes on GPU clusters

Software and Performance Engineering for numerical codes on GPU clusters Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3

More information

Computation of Three-Dimensional Electromagnetic Fields for an Augmented Reality Environment

Computation of Three-Dimensional Electromagnetic Fields for an Augmented Reality Environment Excerpt from the Proceedings of the COMSOL Conference 2008 Hannover Computation of Three-Dimensional Electromagnetic Fields for an Augmented Reality Environment André Buchau 1 * and Wolfgang M. Rucker

More information

CHRONO::HPC DISTRIBUTED MEMORY FLUID-SOLID INTERACTION SIMULATIONS. Felipe Gutierrez, Arman Pazouki, and Dan Negrut University of Wisconsin Madison

CHRONO::HPC DISTRIBUTED MEMORY FLUID-SOLID INTERACTION SIMULATIONS. Felipe Gutierrez, Arman Pazouki, and Dan Negrut University of Wisconsin Madison CHRONO::HPC DISTRIBUTED MEMORY FLUID-SOLID INTERACTION SIMULATIONS Felipe Gutierrez, Arman Pazouki, and Dan Negrut University of Wisconsin Madison Support: Rapid Innovation Fund, U.S. Army TARDEC ASME

More information

pc++/streams: a Library for I/O on Complex Distributed Data-Structures

pc++/streams: a Library for I/O on Complex Distributed Data-Structures pc++/streams: a Library for I/O on Complex Distributed Data-Structures Jacob Gotwals Suresh Srinivas Dennis Gannon Department of Computer Science, Lindley Hall 215, Indiana University, Bloomington, IN

More information

CS 475: Parallel Programming Introduction

CS 475: Parallel Programming Introduction CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.

More information

1 Serial Implementation

1 Serial Implementation Grey Ballard, Razvan Carbunescu, Andrew Gearhart, Mehrzad Tartibi CS267: Homework 2 1 Serial Implementation For n particles, the original code requires O(n 2 ) time because at each time step, the apply

More information

A Chromium Based Viewer for CUMULVS

A Chromium Based Viewer for CUMULVS A Chromium Based Viewer for CUMULVS Submitted to PDPTA 06 Dan Bennett Corresponding Author Department of Mathematics and Computer Science Edinboro University of PA Edinboro, Pennsylvania 16444 Phone: (814)

More information

Scope and Sequence for the New Jersey Core Curriculum Content Standards

Scope and Sequence for the New Jersey Core Curriculum Content Standards Scope and Sequence for the New Jersey Core Curriculum Content Standards The following chart provides an overview of where within Prentice Hall Course 3 Mathematics each of the Cumulative Progress Indicators

More information

Introduction to Parallel Performance Engineering

Introduction to Parallel Performance Engineering Introduction to Parallel Performance Engineering Markus Geimer, Brian Wylie Jülich Supercomputing Centre (with content used with permission from tutorials by Bernd Mohr/JSC and Luiz DeRose/Cray) Performance:

More information

ENERGY-224 Reservoir Simulation Project Report. Ala Alzayer

ENERGY-224 Reservoir Simulation Project Report. Ala Alzayer ENERGY-224 Reservoir Simulation Project Report Ala Alzayer Autumn Quarter December 3, 2014 Contents 1 Objective 2 2 Governing Equations 2 3 Methodolgy 3 3.1 BlockMesh.........................................

More information

Pulsating flow around a stationary cylinder: An experimental study

Pulsating flow around a stationary cylinder: An experimental study Proceedings of the 3rd IASME/WSEAS Int. Conf. on FLUID DYNAMICS & AERODYNAMICS, Corfu, Greece, August 2-22, 2 (pp24-244) Pulsating flow around a stationary cylinder: An experimental study A. DOUNI & D.

More information

Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster G. Jost*, H. Jin*, D. an Mey**,F. Hatay*** *NASA Ames Research Center **Center for Computing and Communication, University of

More information

CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song

CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS Xiaodong Zhang and Yongsheng Song 1. INTRODUCTION Networks of Workstations (NOW) have become important distributed

More information

Parallel Computer Architecture and Programming Written Assignment 3

Parallel Computer Architecture and Programming Written Assignment 3 Parallel Computer Architecture and Programming Written Assignment 3 50 points total. Due Monday, July 17 at the start of class. Problem 1: Message Passing (6 pts) A. (3 pts) You and your friend liked the

More information

Unit 1: Area Find the value of the variable(s). If your answer is not an integer, leave it in simplest radical form.

Unit 1: Area Find the value of the variable(s). If your answer is not an integer, leave it in simplest radical form. Name Per Honors Geometry / Algebra II B Midterm Review Packet 018-19 This review packet is a general set of skills that will be assessed on the midterm. This review packet MAY NOT include every possible

More information

Data Analytics on RAMCloud

Data Analytics on RAMCloud Data Analytics on RAMCloud Jonathan Ellithorpe jdellit@stanford.edu Abstract MapReduce [1] has already become the canonical method for doing large scale data processing. However, for many algorithms including

More information

Domain Decomposition for Colloid Clusters. Pedro Fernando Gómez Fernández

Domain Decomposition for Colloid Clusters. Pedro Fernando Gómez Fernández Domain Decomposition for Colloid Clusters Pedro Fernando Gómez Fernández MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2004 Authorship declaration I, Pedro Fernando

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

Two main topics: `A posteriori (error) control of FEM/FV discretizations with adaptive meshing strategies' `(Iterative) Solution strategies for huge s

Two main topics: `A posteriori (error) control of FEM/FV discretizations with adaptive meshing strategies' `(Iterative) Solution strategies for huge s . Trends in processor technology and their impact on Numerics for PDE's S. Turek Institut fur Angewandte Mathematik, Universitat Heidelberg Im Neuenheimer Feld 294, 69120 Heidelberg, Germany http://gaia.iwr.uni-heidelberg.de/~ture

More information

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Systems Programming and Computer Architecture ( ) Timothy Roscoe Systems Group Department of Computer Science ETH Zürich Systems Programming and Computer Architecture (252-0061-00) Timothy Roscoe Herbstsemester 2016 AS 2016 Caches 1 16: Caches Computer Architecture

More information

Distributed Individual-Based Simulation

Distributed Individual-Based Simulation Distributed Individual-Based Simulation Jiming Liu, Michael B. Dillencourt, Lubomir F. Bic, Daniel Gillen, and Arthur D. Lander University of California Irvine, CA 92697 bic@ics.uci.edu http://www.ics.uci.edu/

More information

Workloads Programmierung Paralleler und Verteilter Systeme (PPV)

Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Workloads 2 Hardware / software execution environment

More information

A Source Localization Technique Based on a Ray-Trace Technique with Optimized Resolution and Limited Computational Costs

A Source Localization Technique Based on a Ray-Trace Technique with Optimized Resolution and Limited Computational Costs Proceedings A Source Localization Technique Based on a Ray-Trace Technique with Optimized Resolution and Limited Computational Costs Yoshikazu Kobayashi 1, *, Kenichi Oda 1 and Katsuya Nakamura 2 1 Department

More information