CLUSTER-BASED MOLECULAR DYNAMICS PARALLEL SIMULATION IN THERMOPHYSICS

Size: px
Start display at page:

Download "CLUSTER-BASED MOLECULAR DYNAMICS PARALLEL SIMULATION IN THERMOPHYSICS"

Transcription

1 CLUSTER-BASED MOLECULAR DYNAMICS PARALLEL SIMULATION IN THERMOPHYSICS JIWU SHU * BING WANG JINZHAO WANG 2 MIN CHEN 2 WEIMIN ZHENG ( Dept. of Computer Science and Technology, Tsinghua Univ., Beijing, 00084) ( 2 Dept. of Engineering Mechanics, Tsinghua Univ., Beijing, 00084) ( * Correspondence should be addressed to Jiwu Shu( shujw@tsinghua.edu.cn)) ABSTRACT Molecular dynamics simulation is an important method for the research of thermophysics. But it is difficult to implement the simulation with traditional serial algorithms because of a complex numerical calculation. A cluster-based spatial decomposition algorithm for solving large-scale Molecular Dynamics simulation of thermophysics is proposed in this paper. With the efficient strategy of domain decomposition and the fast method of neighboring particle location, we greatly reduce the calculating and communicating cost and successfully process the MD simulation for PVT property calculation of a large-scale system with 4,000,000 particles. The spatial decomposition algorithm is implemented on 25 processors with the speedup(sp) of 86.4 and the efficiency(e) of 69.%.The numerical results indicate that the proposed parallel algorithm can simulate a thermophysical system with many more particles than before, and can provide a more efficient way for computer simulation of thermophysical problems. KEY WORDS Parallel computing, Molecular Dynamics, Cluster, Thermophysics.. INTRODUCTION Recently, more and more researchers are attracted by the microscale problems in thermophysics, which includes atomic beam bombardment, liquid-vapor interface and nucleation. Those problems occur in microscopic time and space scale. Therefore it is very difficult to control the processes of those problems. Up to now, there is no traditional experimental method that can measure their processes directly and accurately []. Molecular Dynamics (MD) simulation provides a new method for the research of microscale thermophysics, by which researchers can understand these problems at the level of molecule/atom. The basic principle of MD is calculating the Newton equations of particles in the simulated system, simulating the microscale process of the system s development and then taking statistics of all kinds of macrocosmic parameters of the system.. There are two important characteristics when the MD simulation method is used to study the problems of microscale thermophysics. Firstly, this kind of MD simulation must handle a great number of particles; secondly, the simulating process must go on for numerous time steps (say,000,000 or 0,000,000 steps). Furthermore, MD simulation must deal with the resolution of kinetics equations of many particles, which requires complex numerical calculation. All of the above factors lead to enormous calculation especially when MD simulation is applied to a very large system with a large number of particles. Researchers have spent much time to simplify MD model and improve MD algorithms, but those various fast methods cannot provide a satisfactory resolution to the MD simulation of large-scale system. It is obvious that a thorough resolution of such a problem must depend on the parallel computer system and efficient and scalable parallel algorithms. Up to now, however, only a little work has been done to apply the parallel MD simulation to the microscale thermophysics research and the parallel algorithms are not well developed. So the task of designing an efficient and scalable parallel MD simulation algorithm is much useful to the research of microscale thermophysics. In the recent years, the technology of Cluster has been well developed and commercial components have been widely used, which makes it a fact that the ratio of Cluster s performance to price is much lower than the ratios of PVP and MPP. The Cluster systems are widely used in the international areas of scientific and engineering computing. This paper proposes a new spatial decomposition MD algorithm based on Cluster system, which can be used efficiently to simulatea large-scale multi-particle thermophysical system. Firstly, we provide three different domain division strategies and analyze their efficiency and scalability. Secondly, we propose a method called FLNP (Fast Location of Neighboring Particles) to accelerate the location of neighboring particles, which greatly reduce the cost of interaction calculation and communication. The FLNP method has the advantages of both LinkCell and NeighborList and has a good performance in practical simulation. With the above algorithm and strategies, we simulate a large system with 4,000,000 particles and get satisfactory

2 results on the PVT property calculation of this system. 2. MD SIMULATION MODEL In our work, we simulate a multi-particle system and calculate its PVT properties. The N particles are simulated in a D cubic space with periodic boundary conditions at the state point defined by the reduced density ρ * = and the reduced temperature T * =. 2. The simulation is begun with the particles on an fcc lattice with randomized velocities. A roughly uniform spatial density persists for the duration of the simulation. The simulation is run at constant N, volume V and temperature T, a statistical sampling from the canonical ensemble. The computational task in MD simulation is to solve the Newton s equation [2] given by dvi Fi () t = mi = F2 ( ri, rj ) + F( ri, rj, rk ) + L dt j j k () dri = vi dt where mi is the mass of particle i, r i and vi are its position and velocity vectors. F 2 is a force function describing pair-wise interactions. F k (k>=)describes the multi-body interaction which is ignored in our simulation. The most time-consuming part in MD simulation is the calculation of interaction, which usually requires 90% of total simulation time. The force terms in Equation () are typically non-linear functions of the distance between particle i and the other particles. In our simulation, the interaction can be modeled with a Lennard-Jonse potential energy as φ () r σ = 4ε r 2 σ r where r is the distance between two interacting particles,ε and σ are constants. In a long-range model, each particle interacts with all the other (N ) particles 2 and lead a computational complexity of O ( N ). But many physical problem can be modeled with short-range interaction, that is, the summations in Equation () can be restricted to particles within some small region surrounding particle i. We can implement it using a cutoff distance r c, outside of which all interactions are ignored. In this case, the interaction calculating complexity reduced to O(N ). In our simulation, the cutoff distance is 4.0σ. How to minimize the number of neighboring particles that must be checked for possible interactions is an important problem, which can greatly influence the speed of short-range MD simulation. 6 (2). PARALLEL ALGORITHM AND OPTIMIZING STRATEGY In the past twenty years, researchers have developed three classes of parallel MD simulating algorithms. The first class is called Atom Decomposition (AD) []. This kind of algorithm give a pre-determined subgroup of particles to each processor, and each processor calculates its own particles interaction and updates their velocities and positions. The biggest shortcoming of this algorithm is the enormous memory requirement because each processor must maintains the positions of all the particles. Only when dealing with MD system of a small number of particles on share-memory machines, the AD algorithm can gives a goodish performance. The second class is Force Decomposition algorithm (FD) [], in which, each processor is assigned with two subgroups of particles and it calculates the interactions between these two groups. This kind of algorithm needs not all of the particles position so it requires much less memory than AD algorithm. But FD algorithm cannot maintain load balance so easily as AD algorithm can, and only when the force matrix has a uniform sparse structure, the FD algorithms can achieve a good load balance. The last class is Spatial Decomposition (SD) [4]. The whole simulating domain is divided into sub-domains that are equal to processors in number and each sub-domain is assigned to one processor. Each processor computes only the forces on particles in its sub-domains. The main benefit of SD is that it takes full advantages of the local nature of the inter-particle forces and performs only local communication. Thus, in large-scale MD simulation, it achieves optimal O(N/P) scaling and achieves better performance on Cluster than AD and FD algorithms. Therefore, we chose the SD algorithm as our simulation method and propose three kinds of domain division strategies [5] to make the SD algorithm more efficiency and more scalable.. domain division strategies.a -dimension.b 2-dimension.c -dimension Fig. domain division strategies In Figure (), we propose three typical strategies of domain division. For the convenience of discussion, suppose the whole simulating domain is divided into n sub-domains Σ, the number of processor is i( i =,2, Ln) also n, P i ( i =,2, Ln). The n sub-domains are assigned to n processors separately. That is, processor P i computes the interactions on particles in sub-domain Σ i,and updates their positions and velocities. Below we discuss the

3 differences of these three strategies in load-balance, communication and scalability. Firstly, we discuss the performance of these three strategies in load balance. We can draw a conclusion that the -dimension division showing in Figure (.a) can achieve the best load balance because of two reasons. () The load imbalance is cause mainly because the nonuniformity of particle density, which has the least effects on -dimension division of the three division strategies. Suppose the simulating domain is scaled as x, y, z in three dimensions, and the -dimension division in Figure (.a) is implemented in x dimension. Only the non-uniformity of particle density in x direction can influence the load balance of -dimension division. On the contrary, no matter which dimension the non-uniformity occurs in, the load balance of -dimension in Figure (.c) can be greatly influenced. (2) The algorithm with -dimension division strategy can implement dynamic load balance more easily than that with -dimension division strategy. The communication architecture of -dimension division is easier than that of -dimension division. Thus the algorithm with -dimension division strategy can easily re-divide the sub-domains locally or globally when the particle density alters. On the other hand, the algorithm with -dimension cannot achieve an easy implementation of dynamic load balance due to complex communication. The communication cost in parallel algorithms is determined mainly by two factors. The first is how many data we should transfer and the second is how many times the communication happens. The less communication data volume and time, the less communication cost. There are two kinds of communications in the Spatial Decomposition algorithm. () When any particle moves from sub-domain Σ i to sub-domain Σ j, the processor P i must send the all information of this particle to the processor Pj. This kind of communication is usually called particle move, which is illustrated in Figure (2.a). The communication of particle move is simple because it always happens between neighboring processors. (2) The calculation of interaction on particles that locate near the boundary of sub-domain requires the positions of other particles that may be belong to another processor, which lead to an exchange of particle position called boundary copy illustrated in Figure (2.b). In short-range MD simulation, the exchanges involve only those particles whose distance to boundary is within cutoff distance r c. 2.a particle move 2.b boundary copy Figure 2 communications in parallel SD algorithm We compare the communication cost of three domain division strategies in both communication data volume and communication time. () Under the -dimension division, processor P i need only communicate with the two neighboring processors p and i p, and each time i step there are two communications at most. Under the 2- dimension division, processor P i must communicate with 8 neighboring processors, that requires 4 communications even using fold [] technology. Under the -dimension division, the number of neighboring processors is 26 and the times of communication are 6. Thus if we only consider the communication time, the -dimension division has the lowest communicate cost, and the - dimension the highest. (2) The main task of communication is boundary copy and so we analyze the communicate data volume of boundary copy. For the convenience of discussion, suppose that the simulated system has uniform density. The data volume can be expressed in the following equations C = ( N / ρ ) C C 2 = ( N / ρ ) = ( N / ρ ) where C is separately the total data volume in, C2, C communication of -dimension, 2-dimension and - dimension division strategies. N is the particle number of the simulated system, P is the processor number on Cluster system and ρ is the particle number in each box. We have C : = P / 2 C2 : C : 2 / P : / Specially, when P>6(this condition can be easily achieved), we have C > > C2 C ρ 2 / P / P / 2 ρ 4 ρ 6 Equation (6) shows that, the total communication data volume of -dimension is the least and that of - dimension the greatest. Further experimental result proves that the communication data volume is the dominant factor in total communication cost of large-scale parallel MD simulation. Thus the total communication cost of - dimension division is lowest in those of three kinds of domain division strategies. At last, we will compare the scalabilities of the three strategies. Generally speaking, the -dimension division strategy performs better than the -dimension division. Two facts make us reach this conclusion. () Equation (4) show that when P becomes larger, the communication cost reduces rapidly with -dimension division but remains constant with -dimension. Thus when more and more processors are used, the algorithm with -dimension becomes more and more inefficient. The -dimension division strategy limits the scalability of parallel algorithms. (2) When N is fixed, the number of sub- (4) (5) (6)

4 domains in SD algorithms is limited by the boundary copy. Speaking in detail, the sub-domain must be longer than an individual box in the dividing direction; otherwise the communication must become more complex. The length of a box is r c or r when using FLNP method, s which is discussed bellow. Because the -dimension strategy can apply the division only in one direction, its number of sub-domain is limited most greatly. So the maximal number of processors that can be used in - dimension division is much less than the 2-dimension and -dimension. Form the discussion in () and (2), we can conclude that the algorithm with -dimension division strategy is the most scalable MD simulation algorithm on Cluster. From the above discussion, we can draw the conclusion that when the load balance is well maintained, the - dimension division is the best domain decomposition strategy for the parallel MD simulation on Cluster..2 the FLNP method In short-range MD simulation of a system with N particles, in order to calculate the force on particle i, we need not check all of the other (N-) particles because only those particles who are within the cutoff distance r c can contribute to the force on particle i. There are two basic techniques used to accomplish this. In the first idea, the LinkCell [] method, the simulating domain is divided into many D cells of side length d, where d equal to r c or slightly larger, as illustrated in Figure () and each particle is mapped to some cell. This reduce s the task of finding neighbors of a given particle to checking in 27 cells, that is, the cell which this particle is in and the 26 surrounding ones. Since mapping the particles to cells only requires O (N ) work, the original 2 O ( N ) work required by force calculation is greatly reduced. r c Figure LinkCell r c The other technique used for speeding up MD calculation is known as NeighborList. For each particle, a neighboring particle list is maintained, which includes all of the particles possibly contributing to the force on the given particle, as illustrated in Figure (4). When the list is built, all of the nearby particles within an extended cutoff r c r s Figure 4 NeighborList distance r s = r c + δ are stored. The list is used to calculate interactions for a few time steps. Then before any particles could have moved from a distance r > r s to r < r c, the list is rebuilt. The advantage of the NeighborList method is that after the list has been built, checking all of the possible neighboring particles in list is much faster than checking all particles in simulated system. However, the process of list building and rebuilding still requires checking all of the simulated particles. Based on the analyses of LinkCell and NeighborList, we propose a new speedup technique called FLNP (Fast Location of Neighboring Particles). With this method, the whole simulated domain is divided into many cells with the side length r s, not r c, and at the same time, for each particle, a neighbor list is maintained. This new method has obvious advantages relative to basic LinkCell and NeighborList techniques. Firstly, compared to LinkCell method, it reduces the number of particles that should be check because the there are far fewer particles to check in a sphere of volume 4 π r s than in a cube of volume 27r c. On the other hand, compared to basic NeighborList method, there is a significant saving when list is rebuilt because the checking volume has been reduced from the whole simulated domain to 27r s. δ, which determines the relation between r c and r s, is an important parameter in FLNP method. It can bring significant influence to the efficiency of an algorithm with FLNP. When δ is given too large, the volume of particle checking would be enlarged, so the force calculation time would be increased. On the other hand, if δ is too small, the neighbor list would have to been rebuilt frequently so the advantage of NeighborList would be wasted, and the efficiency of algorithm would be reduced. Although δ is always chosen to be small relative to r c, the optimal value depends on the parameters (e.g. temperature, diffusivity, density) of the particular simulation. The FLNP method can not only greatly decrease the volume of particle checking, but also reduce the calculation and communication cost due to particle move and boundary copy. This is caused mainly by two reasons. ()The FLNP maintains a neighbor list for each particle, which stores all of the particles that can possibly contribute to the force calculation. In algorithms not using FLNP, when some particles near the boundary move from one sub-domain to another, the neighbor list has to be rebuilt. But in algorithms using FLNP, if these moving particles don t enter other particles extended cutoff distance, the neighbor list can be rebuilt late. So the communication cost of particle move can be reduced to some extent. (2)When coping boundary, processors must check which particle is near the boundary and must be

5 send to the neighboring processor. This work must be done at each time step in algorithms not using FLNP. However in algorithms using FLNP, this work can be done once every few time steps, when the neighbor lists are rebuilt. During all the other time steps, we can easily send the latest position information of particles that have been checked as boundary particles in the previous time step. 4 RESULTS AND ANALYSIS The parallel MD algorithm of Section was tested on our Cluster system. This Cluster is made up of 6 SMP nodes. Each node has 4 CPUs of Intel Xeon PIII700, 6Gbytes of hard disk, and Gbytes of memory. The communication medium between SMP nodes is Myrinet Switch with bandwidth of 2.56Gb/s. The software environments are Redhat Linux 7.2(kernel version smp), MPICH-.2.7 and gm-.5pre4 which is network protocol running on Myrinet. 4. Comparison of Domain Division Strategies Number of CPU Sp E(%) Figure(5.a) efficiency and speedup of -dimension division Number of CPU Sp E(%) Figure 5.b efficiency and speedup of 2-dimension division Number of CPU Sp E(%) Figure(5.a,5.b,5.c)show the performance curves of three kinds of domain division strategies separately. Generally speaking, the algorithm with -dimension division gets the highest performance and the one with -dimension division the lowest. Figure(5)also show that the three kinds of domain division strategies have similar parallel efficiency when P is small(say P 9 processors). When more and more processors are used, the efficiency of -dimension division drops down quickly. On the contrary, the declination of efficiency of 2-dimension and -dimension division is slight. We can draw the conclusion that the algorithm with -dimension division is the most efficient and most scalable for MD parallel simulation on Cluster system. The algorithm with 2-dimension division also has a fine scalability but it is less efficiency than that with - dimension division. The algorithm with -dimension is the worst one because of its awful efficiency and scalability, and it can provide a receivable performance only when P is small. 4.2 Influence of FLNP to parallel efficiency In Figure (6), we plot the -dimension algorithm s computing time per step under different δ. The processor number is 8 and the particle number is 4,000,000. The experimental result shows that, the FLNP method can bring much greater improvement to parallel algorithm s speed than the two basic technologies: LinkCell and NeighborList. Firstly, LinkCell is described with the result obtained when δ equals to zero in Figure(6), which shows that the speed with FLNP is about double to the speed with LinkCell. Secondly, the basic NeighborList technology cannot be use separately on MD simulation of such a large-scale system that has 4,000,000 particles. In fact, when 8 processors used, each processor must handle 500,000 particles averagely. If basic NeighborList technology would be used, it should have taken dozens of hours to build the neighbor list once. CPU Time(ms/step) Value of δ Figure 6 CPU timings (ms / time step) under different δ with FLNP The result also shows that, the value of δ can influence the speed of parallel algorithm, which requires a precise value of δ. The optimal value of δ for our simulation is Figure 5.c efficiency and speedup of -dimension division

6 in the scope of [ 0.4σ,0.5σ ]. 5. CONCLUSION We design and implement a Cluster-based spatial decomposition algorithm, which is suitable to the largescale MD simulation of microscale thermophysical problems. Firstly, we eliminate the inefficient global communication in our algorithm due to the local nature of MD simulation. Secondly, we propose three kinds of domain division strategies, which provide different efficiency and scalability. Both the theoretical analysis and the experimental results show that the -dimention domain division is the best one, especially when the load balance is well maintained and the spatial decomposition with this strategy is fit for large-scale MD simulation on Cluster due to its scalability and high efficiency. 2000, 9 47 [5] Ryoko Hayashi, Susumu Horiguchi, Parallel molecular dynamics simulations of polymers (In Japanese), Transactions of Information Processing Society of Japan, 9(6), 998, Another important optimizing strategy in short-range MD simulation is to minimize the number of neighboring particles that must be checked for possible interactions. This paper proposes and implements a new method called fast location of neighboring particles, which combines the benefits of both link-cell and neighborlist and can greatly accelerate the calculation of interaction. δ is the most important parameter in this new method, which can greatly influence the efficiency of parallel algorithm. In the MD simulation of thermophysical problems, it is important to maintain the load balance of the parallel SD algorithm. In the future, we will improve the load balance strategy and make the SD algorithm applicable to all kinds of thermophysical MD simulations. 4. ACKNOWLEDGEMENT This work was supported by Foundation Research Fund form Tsinghua University of China (Grant No. Jc200024). REFERENCES [] Chou F C, Lukes J R, Liang X G, et. al, Molecular Dynamics in Microscale Thermophysical Engineering, Heat Transfer, 0, 999,4-76 [2] M. putz, A. Kolb, Optimization techniques for parallel molecular dynamics using domain decomposition, Computer Physics Communications, (2-),998, [] S.Plimpton, Fast parallel algorithms for short-range molecular dynamics, Journal of Computationa Physics, 7(), 995, -9 [4] Koradi R., Billeter M., Guntert P., Point-centered domain decomposition for parallel molecular dynamics simulation, Journal of Computational Physics, 24(2-),

Hybrid Decomposition Method in Parallel Molecular Dynamics Simulation Based on SMP Cluster Architecture *

Hybrid Decomposition Method in Parallel Molecular Dynamics Simulation Based on SMP Cluster Architecture * TSNGHUA SCENCE AND TECHNLGY SSN 1007-0214 09/23 pp183-188 Volume 10, Number 2, April 2005 Hybrid Decomposition Method in Parallel Molecular Dynamics Simulation Based on SMP Cluster Architecture * WANG

More information

A Database Redo Log System Based on Virtual Memory Disk*

A Database Redo Log System Based on Virtual Memory Disk* A Database Redo Log System Based on Virtual Memory Disk* Haiping Wu, Hongliang Yu, Bigang Li, Xue Wei, and Weimin Zheng Department of Computer Science and Technology, Tsinghua University, 100084, Beijing,

More information

Research on Clearance of Aerial Remote Sensing Images Based on Image Fusion

Research on Clearance of Aerial Remote Sensing Images Based on Image Fusion Research on Clearance of Aerial Remote Sensing Images Based on Image Fusion Institute of Oceanographic Instrumentation, Shandong Academy of Sciences Qingdao, 266061, China E-mail:gyygyy1234@163.com Zhigang

More information

Technical Tricks of Coarse-Grained MD Visualization with VMD

Technical Tricks of Coarse-Grained MD Visualization with VMD Technical Tricks of Coarse-Grained MD Visualization with VMD Institut für Computerphysik, Universität Stuttgart Stuttgart, Germany Arbitrary Units Lennard-Jones-Potential All-atom MD simulations Lengths

More information

NAMD Serial and Parallel Performance

NAMD Serial and Parallel Performance NAMD Serial and Parallel Performance Jim Phillips Theoretical Biophysics Group Serial performance basics Main factors affecting serial performance: Molecular system size and composition. Cutoff distance

More information

2006: Short-Range Molecular Dynamics on GPU. San Jose, CA September 22, 2010 Peng Wang, NVIDIA

2006: Short-Range Molecular Dynamics on GPU. San Jose, CA September 22, 2010 Peng Wang, NVIDIA 2006: Short-Range Molecular Dynamics on GPU San Jose, CA September 22, 2010 Peng Wang, NVIDIA Overview The LAMMPS molecular dynamics (MD) code Cell-list generation and force calculation Algorithm & performance

More information

K-coverage prediction optimization for non-uniform motion objects in wireless video sensor networks

K-coverage prediction optimization for non-uniform motion objects in wireless video sensor networks International Conference on Advanced Electronic Science and Technology (AEST 2016) K-coverage prediction optimization for non-uniform motion objects in wireless video sensor networks a Yibo Jiang, Shanghao

More information

Processing Technology of Massive Human Health Data Based on Hadoop

Processing Technology of Massive Human Health Data Based on Hadoop 6th International Conference on Machinery, Materials, Environment, Biotechnology and Computer (MMEBC 2016) Processing Technology of Massive Human Health Data Based on Hadoop Miao Liu1, a, Junsheng Yu1,

More information

Clustering and Reclustering HEP Data in Object Databases

Clustering and Reclustering HEP Data in Object Databases Clustering and Reclustering HEP Data in Object Databases Koen Holtman CERN EP division CH - Geneva 3, Switzerland We formulate principles for the clustering of data, applicable to both sequential HEP applications

More information

Application of the MCMC Method for the Calibration of DSMC Parameters

Application of the MCMC Method for the Calibration of DSMC Parameters Application of the MCMC Method for the Calibration of DSMC Parameters James S. Strand and David B. Goldstein Aerospace Engineering Dept., 1 University Station, C0600, The University of Texas at Austin,

More information

Using GPUs to compute the multilevel summation of electrostatic forces

Using GPUs to compute the multilevel summation of electrostatic forces Using GPUs to compute the multilevel summation of electrostatic forces David J. Hardy Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of

More information

Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow.

Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow. Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow. Big problems and Very Big problems in Science How do we live Protein

More information

A TIMING AND SCALABILITY ANALYSIS OF THE PARALLEL PERFORMANCE OF CMAQ v4.5 ON A BEOWULF LINUX CLUSTER

A TIMING AND SCALABILITY ANALYSIS OF THE PARALLEL PERFORMANCE OF CMAQ v4.5 ON A BEOWULF LINUX CLUSTER A TIMING AND SCALABILITY ANALYSIS OF THE PARALLEL PERFORMANCE OF CMAQ v4.5 ON A BEOWULF LINUX CLUSTER Shaheen R. Tonse* Lawrence Berkeley National Lab., Berkeley, CA, USA 1. INTRODUCTION The goal of this

More information

A paralleled algorithm based on multimedia retrieval

A paralleled algorithm based on multimedia retrieval A paralleled algorithm based on multimedia retrieval Changhong Guo Teaching and Researching Department of Basic Course, Jilin Institute of Physical Education, Changchun 130022, Jilin, China Abstract With

More information

Adaptive osculatory rational interpolation for image processing

Adaptive osculatory rational interpolation for image processing Journal of Computational and Applied Mathematics 195 (2006) 46 53 www.elsevier.com/locate/cam Adaptive osculatory rational interpolation for image processing Min Hu a, Jieqing Tan b, a College of Computer

More information

Construction Scheme for Cloud Platform of NSFC Information System

Construction Scheme for Cloud Platform of NSFC Information System , pp.200-204 http://dx.doi.org/10.14257/astl.2016.138.40 Construction Scheme for Cloud Platform of NSFC Information System Jianjun Li 1, Jin Wang 1, Yuhui Zheng 2 1 Information Center, National Natural

More information

Parallelization of K-Means Clustering Algorithm for Data Mining

Parallelization of K-Means Clustering Algorithm for Data Mining Parallelization of K-Means Clustering Algorithm for Data Mining Hao JIANG a, Liyan YU b College of Computer Science and Engineering, Southeast University, Nanjing, China a hjiang@seu.edu.cn, b yly.sunshine@qq.com

More information

Research on Load Balancing in Task Allocation Process in Heterogeneous Hadoop Cluster

Research on Load Balancing in Task Allocation Process in Heterogeneous Hadoop Cluster 2017 2 nd International Conference on Artificial Intelligence and Engineering Applications (AIEA 2017) ISBN: 978-1-60595-485-1 Research on Load Balancing in Task Allocation Process in Heterogeneous Hadoop

More information

CES/CMS Overview. Geometric Parameters, G,

CES/CMS Overview. Geometric Parameters, G, Northwestern University Department of Materials Science and Engineering Mat Sci 390: Materials Design (Spring 2016) Laboratory #1: Materials Selection in Mechanical Design CES/CMS Overview I. What is CES/CMS?

More information

The Improvement and Implementation of the High Concurrency Web Server Based on Nginx Baiqi Wang1, a, Jiayue Liu2,b and Zhiyi Fang 3,*

The Improvement and Implementation of the High Concurrency Web Server Based on Nginx Baiqi Wang1, a, Jiayue Liu2,b and Zhiyi Fang 3,* Computing, Performance and Communication systems (2016) 1: 1-7 Clausius Scientific Press, Canada The Improvement and Implementation of the High Concurrency Web Server Based on Nginx Baiqi Wang1, a, Jiayue

More information

High performance computing using AUTODYN-3D

High performance computing using AUTODYN-3D High performance computing using AUTODYN-3D M. S. Cowler', O. La'adan\ T. Ohta^ ' Century Dynamics Incorporated, USA. Hebrew University ofjerusalem, Israel. * CRC Research Institute Incorporated, Japan.

More information

Efficient Parallel Multi-Copy Simulation of the Hybrid Atomistic-Continuum Simulation Based on Geometric Coupling

Efficient Parallel Multi-Copy Simulation of the Hybrid Atomistic-Continuum Simulation Based on Geometric Coupling , pp.384-389 http://dx.doi.org/10.14257/astl.2016.139.77 Efficient Parallel Multi-Copy Simulation of the Hybrid Atomistic-Continuum Simulation Based on Geometric Coupling Qian Wang 1,2, Xin-Hai Xu 1,2,*,

More information

PARALLEL SIMULATION OF A FLUID FLOW BY MEANS OF THE SPH METHOD: OPENMP VS. MPI COMPARISON. Pawe l Wróblewski, Krzysztof Boryczko

PARALLEL SIMULATION OF A FLUID FLOW BY MEANS OF THE SPH METHOD: OPENMP VS. MPI COMPARISON. Pawe l Wróblewski, Krzysztof Boryczko Computing and Informatics, Vol. 28, 2009, 139 150 PARALLEL SIMULATION OF A FLUID FLOW BY MEANS OF THE SPH METHOD: OPENMP VS. MPI COMPARISON Pawe l Wróblewski, Krzysztof Boryczko Department of Computer

More information

An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid

An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid Demin Wang 2, Hong Zhu 1, and Xin Liu 2 1 College of Computer Science and Technology, Jilin University, Changchun

More information

A priority based dynamic bandwidth scheduling in SDN networks 1

A priority based dynamic bandwidth scheduling in SDN networks 1 Acta Technica 62 No. 2A/2017, 445 454 c 2017 Institute of Thermomechanics CAS, v.v.i. A priority based dynamic bandwidth scheduling in SDN networks 1 Zun Wang 2 Abstract. In order to solve the problems

More information

The Elimination Eyelash Iris Recognition Based on Local Median Frequency Gabor Filters

The Elimination Eyelash Iris Recognition Based on Local Median Frequency Gabor Filters Journal of Information Hiding and Multimedia Signal Processing c 2015 ISSN 2073-4212 Ubiquitous International Volume 6, Number 3, May 2015 The Elimination Eyelash Iris Recognition Based on Local Median

More information

6. Parallel Volume Rendering Algorithms

6. Parallel Volume Rendering Algorithms 6. Parallel Volume Algorithms This chapter introduces a taxonomy of parallel volume rendering algorithms. In the thesis statement we claim that parallel algorithms may be described by "... how the tasks

More information

CRYSTAL in parallel: replicated and distributed. Ian Bush Numerical Algorithms Group Ltd, HECToR CSE

CRYSTAL in parallel: replicated and distributed. Ian Bush Numerical Algorithms Group Ltd, HECToR CSE CRYSTAL in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE Introduction Why parallel? What is in a parallel computer When parallel? Pcrystal MPPcrystal

More information

A MPI-based parallel pyramid building algorithm for large-scale RS image

A MPI-based parallel pyramid building algorithm for large-scale RS image A MPI-based parallel pyramid building algorithm for large-scale RS image Gaojin He, Wei Xiong, Luo Chen, Qiuyun Wu, Ning Jing College of Electronic and Engineering, National University of Defense Technology,

More information

Performance Metrics of a Parallel Three Dimensional Two-Phase DSMC Method for Particle-Laden Flows

Performance Metrics of a Parallel Three Dimensional Two-Phase DSMC Method for Particle-Laden Flows Performance Metrics of a Parallel Three Dimensional Two-Phase DSMC Method for Particle-Laden Flows Benzi John* and M. Damodaran** Division of Thermal and Fluids Engineering, School of Mechanical and Aerospace

More information

NUMERICAL ANALYSIS OF WIND EFFECT ON HIGH-DENSITY BUILDING AERAS

NUMERICAL ANALYSIS OF WIND EFFECT ON HIGH-DENSITY BUILDING AERAS NUMERICAL ANALYSIS OF WIND EFFECT ON HIGH-DENSITY BUILDING AERAS Bin ZHAO, Ying LI, Xianting LI and Qisen YAN Department of Thermal Engineering, Tsinghua University Beijing, 100084, P.R. China ABSTRACT

More information

Engineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary

Engineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary white paper Computer-Aided Engineering ANSYS Mechanical on Intel Xeon Processors Engineer Productivity Boosted by Higher-Core CPUs Engineers can be significantly more productive when ANSYS Mechanical runs

More information

A Rapid Automatic Image Registration Method Based on Improved SIFT

A Rapid Automatic Image Registration Method Based on Improved SIFT Available online at www.sciencedirect.com Procedia Environmental Sciences 11 (2011) 85 91 A Rapid Automatic Image Registration Method Based on Improved SIFT Zhu Hongbo, Xu Xuejun, Wang Jing, Chen Xuesong,

More information

Local Multilevel Fast Multipole Algorithm for 3D Electromagnetic Scattering

Local Multilevel Fast Multipole Algorithm for 3D Electromagnetic Scattering Progress In Electromagnetics Research Symposium 2005, Hangzhou, China, August 22-26 745 Local Multilevel Fast Multipole Algorithm for 3D Electromagnetic Scattering Jun Hu, Zaiping Nie, Lin Lei, and Jun

More information

The Optimal CPU and Interconnect for an HPC Cluster

The Optimal CPU and Interconnect for an HPC Cluster 5. LS-DYNA Anwenderforum, Ulm 2006 Cluster / High Performance Computing I The Optimal CPU and Interconnect for an HPC Cluster Andreas Koch Transtec AG, Tübingen, Deutschland F - I - 15 Cluster / High Performance

More information

Research on Design and Application of Computer Database Quality Evaluation Model

Research on Design and Application of Computer Database Quality Evaluation Model Research on Design and Application of Computer Database Quality Evaluation Model Abstract Hong Li, Hui Ge Shihezi Radio and TV University, Shihezi 832000, China Computer data quality evaluation is the

More information

WETTING PROPERTIES OF STRUCTURED INTERFACES COMPOSED OF SURFACE-ATTACHED SPHERICAL NANOPARTICLES

WETTING PROPERTIES OF STRUCTURED INTERFACES COMPOSED OF SURFACE-ATTACHED SPHERICAL NANOPARTICLES November 20, 2018 WETTING PROPERTIES OF STRUCTURED INTERFACES COMPOSED OF SURFACE-ATTACHED SPHERICAL NANOPARTICLES Bishal Bhattarai and Nikolai V. Priezjev Department of Mechanical and Materials Engineering

More information

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs C.-C. Su a, C.-W. Hsieh b, M. R. Smith b, M. C. Jermy c and J.-S. Wu a a Department of Mechanical Engineering, National Chiao Tung

More information

Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds

Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds 9 1th International Conference on Document Analysis and Recognition Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds Weihan Sun, Koichi Kise Graduate School

More information

Heterogeneous Multi-Computer System A New Platform for Multi-Paradigm Scientific Simulation

Heterogeneous Multi-Computer System A New Platform for Multi-Paradigm Scientific Simulation Heterogeneous Multi-Computer System A New Platform for Multi-Paradigm Scientific Simulation Taisuke Boku, Hajime Susa, Masayuki Umemura, Akira Ukawa Center for Computational Physics, University of Tsukuba

More information

Interaction of Fluid Simulation Based on PhysX Physics Engine. Huibai Wang, Jianfei Wan, Fengquan Zhang

Interaction of Fluid Simulation Based on PhysX Physics Engine. Huibai Wang, Jianfei Wan, Fengquan Zhang 4th International Conference on Sensors, Measurement and Intelligent Materials (ICSMIM 2015) Interaction of Fluid Simulation Based on PhysX Physics Engine Huibai Wang, Jianfei Wan, Fengquan Zhang College

More information

Performance Analysis of the Lattice Boltzmann Method on x86-64 Architectures

Performance Analysis of the Lattice Boltzmann Method on x86-64 Architectures Performance Analysis of the Lattice Boltzmann Method on x86-64 Architectures Jan Treibig, Simon Hausmann, Ulrich Ruede Zusammenfassung The Lattice Boltzmann method (LBM) is a well established algorithm

More information

SOLIDWORKS Flow Simulation Options

SOLIDWORKS Flow Simulation Options SOLIDWORKS Flow Simulation Options SOLIDWORKS Flow Simulation includes an options dialogue window that allows for defining default options to use for a new project. Some of the options included are unit

More information

CE 530 Molecular Simulation

CE 530 Molecular Simulation 1 CE 530 Molecular Simulation Lecture 3 Common Elements of a Molecular Simulation David A. Kofke Department of Chemical Engineering SUNY Buffalo kofke@eng.buffalo.edu 2 Boundary Conditions Impractical

More information

Line Net Global Vectorization: an Algorithm and Its Performance Evaluation

Line Net Global Vectorization: an Algorithm and Its Performance Evaluation Line Net Global Vectorization: an Algorithm and Its Performance Evaluation Jiqiang Song 1, Feng Su 1, Jibing Chen 1, Chiewlan Tai 2, and Shijie Cai 1 1 Department of Computer Science of Nanjing University,

More information

A COMPETITION BASED ROOF DETECTION ALGORITHM FROM AIRBORNE LIDAR DATA

A COMPETITION BASED ROOF DETECTION ALGORITHM FROM AIRBORNE LIDAR DATA A COMPETITION BASED ROOF DETECTION ALGORITHM FROM AIRBORNE LIDAR DATA HUANG Xianfeng State Key Laboratory of Informaiton Engineering in Surveying, Mapping and Remote Sensing (Wuhan University), 129 Luoyu

More information

S7260: Microswimmers on Speed: Simulating Spheroidal Squirmers on GPUs

S7260: Microswimmers on Speed: Simulating Spheroidal Squirmers on GPUs S7260: Microswimmers on Speed: Simulating Spheroidal Squirmers on GPUs Elmar Westphal - Forschungszentrum Jülich GmbH Spheroids Spheroid: A volume formed by rotating an ellipse around one of its axes Two

More information

Dynamics Response of Spatial Parallel Coordinate Measuring Machine with Clearances

Dynamics Response of Spatial Parallel Coordinate Measuring Machine with Clearances Sensors & Transducers 2013 by IFSA http://www.sensorsportal.com Dynamics Response of Spatial Parallel Coordinate Measuring Machine with Clearances Yu DENG, Xiulong CHEN, Suyu WANG Department of mechanical

More information

Microwell Mixing with Surface Tension

Microwell Mixing with Surface Tension Microwell Mixing with Surface Tension Nick Cox Supervised by Professor Bruce Finlayson University of Washington Department of Chemical Engineering June 6, 2007 Abstract For many applications in the pharmaceutical

More information

Using the Discrete Ordinates Radiation Model

Using the Discrete Ordinates Radiation Model Tutorial 6. Using the Discrete Ordinates Radiation Model Introduction This tutorial illustrates the set up and solution of flow and thermal modelling of a headlamp. The discrete ordinates (DO) radiation

More information

Material Made of Artificial Molecules and Its Refraction Behavior under Microwave

Material Made of Artificial Molecules and Its Refraction Behavior under Microwave Material Made of Artificial Molecules and Its Refraction Behavior under Microwave Tao Zhang College of Nuclear Science and Technology, Beijing Normal University, Beijing 100875, China (taozhang@bnu.edu.cn)

More information

A Task Scheduling Method for Data Intensive Jobs in Multicore Distributed System

A Task Scheduling Method for Data Intensive Jobs in Multicore Distributed System 第一工業大学研究報告第 27 号 (2015)pp.13-17 13 A Task Scheduling Method for Data Intensive Jobs in Multicore Distributed System Kazuo Hajikano* 1 Hidehiro Kanemitsu* 2 Moo Wan Kim* 3 *1 Department of Information Technology

More information

Study on the Design Method of Impeller on Low Specific Speed Centrifugal Pump

Study on the Design Method of Impeller on Low Specific Speed Centrifugal Pump Send Orders for Reprints to reprints@benthamscience.ae 594 The Open Mechanical Engineering Journal, 2015, 9, 594-600 Open Access Study on the Design Method of Impeller on Low Specific Speed Centrifugal

More information

Phase-field simulation of two-phase micro-flows in a Hele-Shaw cell

Phase-field simulation of two-phase micro-flows in a Hele-Shaw cell Computational Methods in Multiphase Flow III 7 Phase-field simulation of two-phase micro-flows in a Hele-Shaw cell Y. Sun & C. Beckermann Department of Mechanical and Industrial Engineering, University

More information

Fast Multipole Method on the GPU

Fast Multipole Method on the GPU Fast Multipole Method on the GPU with application to the Adaptive Vortex Method University of Bristol, Bristol, United Kingdom. 1 Introduction Particle methods Highly parallel Computational intensive Numerical

More information

GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC

GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC MIKE GOWANLOCK NORTHERN ARIZONA UNIVERSITY SCHOOL OF INFORMATICS, COMPUTING & CYBER SYSTEMS BEN KARSIN UNIVERSITY OF HAWAII AT MANOA DEPARTMENT

More information

Accelerating Molecular Modeling Applications with Graphics Processors

Accelerating Molecular Modeling Applications with Graphics Processors Accelerating Molecular Modeling Applications with Graphics Processors John Stone Theoretical and Computational Biophysics Group University of Illinois at Urbana-Champaign Research/gpu/ SIAM Conference

More information

Die Wear Profile Investigation in Hot Forging

Die Wear Profile Investigation in Hot Forging Die Wear Profile Investigation in Hot Forging F. R. Biglari, M Zamani Abstract In this study, the wear profile on the die surface during the hot forging operation for an axisymmetric cross-section is examined.

More information

2.7 Cloth Animation. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter 2 123

2.7 Cloth Animation. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter 2 123 2.7 Cloth Animation 320491: Advanced Graphics - Chapter 2 123 Example: Cloth draping Image Michael Kass 320491: Advanced Graphics - Chapter 2 124 Cloth using mass-spring model Network of masses and springs

More information

Determining The Surface Tension Of Water Via Light Scattering

Determining The Surface Tension Of Water Via Light Scattering Determining The Surface Tension Of Water Via Light Scattering Howard Henry Physics Department, The College of Wooster, Wooster, Ohio 44691, USA (Dated: May 10, 007) The diffraction pattern created by the

More information

A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data

A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data Journal of Computational Information Systems 11: 6 (2015) 2139 2146 Available at http://www.jofcis.com A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data

More information

Molecular Dynamics Simulations with Julia

Molecular Dynamics Simulations with Julia Emily Crabb 6.338/18.337 Final Project Molecular Dynamics Simulations with Julia I. Project Overview This project consists of one serial and several parallel versions of a molecular dynamics simulation

More information

Methods for Division of Road Traffic Network for Distributed Simulation Performed on Heterogeneous Clusters

Methods for Division of Road Traffic Network for Distributed Simulation Performed on Heterogeneous Clusters DOI:10.2298/CSIS120601006P Methods for Division of Road Traffic Network for Distributed Simulation Performed on Heterogeneous Clusters Tomas Potuzak 1 1 University of West Bohemia, Department of Computer

More information

Fast K-nearest neighbors searching algorithms for point clouds data of 3D scanning system 1

Fast K-nearest neighbors searching algorithms for point clouds data of 3D scanning system 1 Acta Technica 62 No. 3B/2017, 141 148 c 2017 Institute of Thermomechanics CAS, v.v.i. Fast K-nearest neighbors searching algorithms for point clouds data of 3D scanning system 1 Zhang Fan 2, 3, Tan Yuegang

More information

Research Article An Improved Topology-Potential-Based Community Detection Algorithm for Complex Network

Research Article An Improved Topology-Potential-Based Community Detection Algorithm for Complex Network e Scientific World Journal, Article ID 121609, 7 pages http://dx.doi.org/10.1155/2014/121609 Research Article An Improved Topology-Potential-Based Community Detection Algorithm for Complex Network Zhixiao

More information

Two-Phase flows on massively parallel multi-gpu clusters

Two-Phase flows on massively parallel multi-gpu clusters Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous

More information

SCALING UP OF E-MSR CODES BASED DISTRIBUTED STORAGE SYSTEMS WITH FIXED NUMBER OF REDUNDANCY NODES

SCALING UP OF E-MSR CODES BASED DISTRIBUTED STORAGE SYSTEMS WITH FIXED NUMBER OF REDUNDANCY NODES SCALING UP OF E-MSR CODES BASED DISTRIBUTED STORAGE SYSTEMS WITH FIXED NUMBER OF REDUNDANCY NODES Haotian Zhao, Yinlong Xu and Liping Xiang School of Computer Science and Technology, University of Science

More information

Contour-Based Large Scale Image Retrieval

Contour-Based Large Scale Image Retrieval Contour-Based Large Scale Image Retrieval Rong Zhou, and Liqing Zhang MOE-Microsoft Key Laboratory for Intelligent Computing and Intelligent Systems, Department of Computer Science and Engineering, Shanghai

More information

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM EFFICIENT ATTRIBUTE REDUCTION ALGORITHM Zhongzhi Shi, Shaohui Liu, Zheng Zheng Institute Of Computing Technology,Chinese Academy of Sciences, Beijing, China Abstract: Key words: Efficiency of algorithms

More information

A Comprehensive Study on the Performance of Implicit LS-DYNA

A Comprehensive Study on the Performance of Implicit LS-DYNA 12 th International LS-DYNA Users Conference Computing Technologies(4) A Comprehensive Study on the Performance of Implicit LS-DYNA Yih-Yih Lin Hewlett-Packard Company Abstract This work addresses four

More information

Extending SLURM with Support for GPU Ranges

Extending SLURM with Support for GPU Ranges Available on-line at www.prace-ri.eu Partnership for Advanced Computing in Europe Extending SLURM with Support for GPU Ranges Seren Soner a, Can Özturana,, Itir Karac a a Computer Engineering Department,

More information

Ateles performance assessment report

Ateles performance assessment report Ateles performance assessment report Document Information Reference Number Author Contributor(s) Date Application Service Level Keywords AR-4, Version 0.1 Jose Gracia (USTUTT-HLRS) Christoph Niethammer,

More information

ZigBee Routing Algorithm Based on Energy Optimization

ZigBee Routing Algorithm Based on Energy Optimization Sensors & Transducers 2013 by IFSA http://www.sensorsportal.com ZigBee Routing Algorithm Based on Energy Optimization Wangang Wang, Yong Peng, Yongyu Peng Chongqing City Management College, No. 151 Daxuecheng

More information

Study on fabric density identification based on binary feature matrix

Study on fabric density identification based on binary feature matrix 153 Study on fabric density identification based on binary feature matrix Xiuchen Wang 1,2 Xiaojiu Li 2 Zhe Liu 1 1 Zhongyuan University of Technology Zhengzhou, China 2Tianjin Polytechnic University Tianjin,

More information

Scalability Study of Particle Method with Dynamic Load Balancing

Scalability Study of Particle Method with Dynamic Load Balancing Scalability Study of Particle Method with Dynamic Load Balancing Hailong Teng Livermore Software Technology Corp. Abstract We introduce an efficient load-balancing algorithm for particle method (Particle

More information

Assessment of LS-DYNA Scalability Performance on Cray XD1

Assessment of LS-DYNA Scalability Performance on Cray XD1 5 th European LS-DYNA Users Conference Computing Technology (2) Assessment of LS-DYNA Scalability Performance on Cray Author: Ting-Ting Zhu, Cray Inc. Correspondence: Telephone: 651-65-987 Fax: 651-65-9123

More information

Appendix E: Software

Appendix E: Software Appendix E: Software Video Analysis of Motion Analyzing pictures (movies or videos) is a powerful tool for understanding how objects move. Like most forms of data, video is most easily analyzed using a

More information

Introduction to geodynamic modelling Introduction to DOUAR

Introduction to geodynamic modelling Introduction to DOUAR Introduction to geodynamic modelling Introduction to DOUAR David Whipp and Lars Kaislaniemi Department of Geosciences and Geography, Univ. Helsinki 1 Goals of this lecture Introduce DOUAR, a 3D thermomechanical

More information

Architecture, Programming and Performance of MIC Phi Coprocessor

Architecture, Programming and Performance of MIC Phi Coprocessor Architecture, Programming and Performance of MIC Phi Coprocessor JanuszKowalik, Piotr Arłukowicz Professor (ret), The Boeing Company, Washington, USA Assistant professor, Faculty of Mathematics, Physics

More information

MA 243 Calculus III Fall Assignment 1. Reading assignments are found in James Stewart s Calculus (Early Transcendentals)

MA 243 Calculus III Fall Assignment 1. Reading assignments are found in James Stewart s Calculus (Early Transcendentals) MA 43 Calculus III Fall 8 Dr. E. Jacobs Assignments Reading assignments are found in James Stewart s Calculus (Early Transcendentals) Assignment. Spheres and Other Surfaces Read. -. and.6 Section./Problems

More information

OPTIMIZATION OF MONTE CARLO TRANSPORT SIMULATIONS IN STOCHASTIC MEDIA

OPTIMIZATION OF MONTE CARLO TRANSPORT SIMULATIONS IN STOCHASTIC MEDIA PHYSOR 2012 Advances in Reactor Physics Linking Research, Industry, and Education Knoxville, Tennessee, USA, April 15-20, 2012, on CD-ROM, American Nuclear Society, LaGrange Park, IL (2012) OPTIMIZATION

More information

Parallel Summation of Inter-Particle Forces in SPH

Parallel Summation of Inter-Particle Forces in SPH Parallel Summation of Inter-Particle Forces in SPH Fifth International Workshop on Meshfree Methods for Partial Differential Equations 17.-19. August 2009 Bonn Overview Smoothed particle hydrodynamics

More information

Parallel Implementation of a Random Search Procedure: An Experimental Study

Parallel Implementation of a Random Search Procedure: An Experimental Study Parallel Implementation of a Random Search Procedure: An Experimental Study NIKOLAI K. KRIVULIN Faculty of Mathematics and Mechanics St. Petersburg State University 28 University Ave., St. Petersburg,

More information

ECE 669 Parallel Computer Architecture

ECE 669 Parallel Computer Architecture ECE 669 Parallel Computer Architecture Lecture 9 Workload Evaluation Outline Evaluation of applications is important Simulation of sample data sets provides important information Working sets indicate

More information

A Modular k-nearest Neighbor Classification Method for Massively Parallel Text Categorization

A Modular k-nearest Neighbor Classification Method for Massively Parallel Text Categorization A Modular k-nearest Neighbor Classification Method for Massively Parallel Text Categorization Hai Zhao and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University, 1954

More information

Video annotation based on adaptive annular spatial partition scheme

Video annotation based on adaptive annular spatial partition scheme Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory

More information

Efficient Computation of Radial Distribution Function on GPUs

Efficient Computation of Radial Distribution Function on GPUs Efficient Computation of Radial Distribution Function on GPUs Yi-Cheng Tu * and Anand Kumar Department of Computer Science and Engineering University of South Florida, Tampa, Florida 2 Overview Introduction

More information

Meshless physical simulation of semiconductor devices using a wavelet-based nodes generator

Meshless physical simulation of semiconductor devices using a wavelet-based nodes generator Meshless physical simulation of semiconductor devices using a wavelet-based nodes generator Rashid Mirzavand 1, Abdolali Abdipour 1a), Gholamreza Moradi 1, and Masoud Movahhedi 2 1 Microwave/mm-wave &

More information

Optimization solutions for the segmented sum algorithmic function

Optimization solutions for the segmented sum algorithmic function Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code

More information

Parallelization of Scientific Applications (II)

Parallelization of Scientific Applications (II) Parallelization of Scientific Applications (II) Parallelization of Particle Based Methods Russian-German School on High Performance Computer Systems, June, 27 th until July, 6 th 2005, Novosibirsk 4. Day,

More information

A Novel Extreme Point Selection Algorithm in SIFT

A Novel Extreme Point Selection Algorithm in SIFT A Novel Extreme Point Selection Algorithm in SIFT Ding Zuchun School of Electronic and Communication, South China University of Technolog Guangzhou, China zucding@gmail.com Abstract. This paper proposes

More information

MOLECULAR DYNAMICS ON DISTRIBUTED-MEMORY MIMD COMPUTERS WITH LOAD BALANCING 1. INTRODUCTION

MOLECULAR DYNAMICS ON DISTRIBUTED-MEMORY MIMD COMPUTERS WITH LOAD BALANCING 1. INTRODUCTION BSME International Congress and Exposition Chicago, IL November 6-11, 1994 MOLECULAR DYNAMICS ON DISTRIBUTED-MEMORY MIMD COMPUTERS WITH LOAD BALANCING YUEFAN DENG, R. ALAX MCCOY, ROBERT B. MARR, RONALD

More information

CFD Post-Processing of Rampressor Rotor Compressor

CFD Post-Processing of Rampressor Rotor Compressor Gas Turbine Industrial Fellowship Program 2006 CFD Post-Processing of Rampressor Rotor Compressor Curtis Memory, Brigham Young niversity Ramgen Power Systems Mentor: Rob Steele I. Introduction Recent movements

More information

COMPUTER EXERCISE: POPULATION DYNAMICS IN SPACE September 3, 2013

COMPUTER EXERCISE: POPULATION DYNAMICS IN SPACE September 3, 2013 COMPUTER EXERCISE: POPULATION DYNAMICS IN SPACE September 3, 2013 Objectives: Introduction to coupled maps lattice as a basis for spatial modeling Solve a spatial Ricker model to investigate how wave speed

More information

A *69>H>N6 #DJGC6A DG C<>C::G>C<,8>:C8:H /DA 'D 2:6G, ()-"&"3 -"(' ( +-" " " % '.+ % ' -0(+$,

A *69>H>N6 #DJGC6A DG C<>C::G>C<,8>:C8:H /DA 'D 2:6G, ()-&3 -(' ( +-   % '.+ % ' -0(+$, The structure is a very important aspect in neural network design, it is not only impossible to determine an optimal structure for a given problem, it is even impossible to prove that a given structure

More information

True 3D CAE visualization of filling imbalance in geometry-balanced runners

True 3D CAE visualization of filling imbalance in geometry-balanced runners True 3D CAE visualization of filling imbalance in geometry-balanced runners C.C. Chien, * C.C. Chiang, W. H. Yang, Vito Tsai and David C.Hsu CoreTech System Co.,Ltd., HsinChu, Taiwan, ROC Abstract The

More information

CSC 391/691: GPU Programming Fall N-Body Problem. Copyright 2011 Samuel S. Cho

CSC 391/691: GPU Programming Fall N-Body Problem. Copyright 2011 Samuel S. Cho CSC 391/691: GPU Programming Fall 2011 N-Body Problem Copyright 2011 Samuel S. Cho Introduction Many physical phenomena can be simulated with a particle system where each particle interacts with all other

More information

Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis

Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis Sensors & Transducers 2014 by IFSA Publishing, S. L. http://www.sensorsportal.com Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis 1 Xulin LONG, 1,* Qiang CHEN, 2 Xiaoya

More information

Benchmark runs of pcmalib on Nehalem and Shanghai nodes

Benchmark runs of pcmalib on Nehalem and Shanghai nodes MOSAIC group Institute of Theoretical Computer Science Department of Computer Science Benchmark runs of pcmalib on Nehalem and Shanghai nodes Christian Lorenz Müller, April 9 Addresses: Institute for Theoretical

More information

Application of Geometry Rectification to Deformed Characters Recognition Liqun Wang1, a * and Honghui Fan2

Application of Geometry Rectification to Deformed Characters Recognition Liqun Wang1, a * and Honghui Fan2 6th International Conference on Electronic, Mechanical, Information and Management (EMIM 2016) Application of Geometry Rectification to Deformed Characters Liqun Wang1, a * and Honghui Fan2 1 School of

More information