Parallel Computation of the Functions Constructed with

Similar documents
NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

3D vector computer graphics


Accounting for the Use of Different Length Scale Factors in x, y and z Directions

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Cluster Analysis of Electrical Behavior

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Programming in Fortran 90 : 2017/2018

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Wavefront Reconstructor

High-Boost Mesh Filtering for 3-D Shape Enhancement

Multiblock method for database generation in finite element programs

Hermite Splines in Lie Groups as Products of Geodesics

Mathematics 256 a course in differential equations for engineering students

Barycentric Coordinates. From: Mean Value Coordinates for Closed Triangular Meshes by Ju et al.

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

FPGA-based implementation of circular interpolation

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Analysis of 3D Cracks in an Arbitrary Geometry with Weld Residual Stress

Support Vector Machines

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

A Binarization Algorithm specialized on Document Images and Photos

Kinematics of pantograph masts

Parallel matrix-vector multiplication

Modeling, Manipulating, and Visualizing Continuous Volumetric Data: A Novel Spline-based Approach

On Some Entertaining Applications of the Concept of Set in Computer Science Course

Finite Element Analysis of Rubber Sealing Ring Resilience Behavior Qu Jia 1,a, Chen Geng 1,b and Yang Yuwei 2,c

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Dynamic wetting property investigation of AFM tips in micro/nanoscale

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

An Optimal Algorithm for Prufer Codes *

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

S.P.H. : A SOLUTION TO AVOID USING EROSION CRITERION?

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Color in OpenGL Polygonal Shading Light Source in OpenGL Material Properties Normal Vectors Phong model

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Very simple computational domains can be discretized using boundary-fitted structured meshes (also called grids)

Scan Conversion & Shading

The Research of Ellipse Parameter Fitting Algorithm of Ultrasonic Imaging Logging in the Casing Hole

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Scan Conversion & Shading

Numerical model describing optimization of fibres winding process on open and closed frame

The Codesign Challenge

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Speedup of Type-1 Fuzzy Logic Systems on Graphics Processing Units Using CUDA

Load Balancing for Hex-Cell Interconnection Network

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

S1 Note. Basis functions.

A high precision collaborative vision measurement of gear chamfering profile

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

Line Clipping by Convex and Nonconvex Polyhedra in E 3

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

SURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB

Investigations of Topology and Shape of Multi-material Optimum Design of Structures

REFRACTION. a. To study the refraction of light from plane surfaces. b. To determine the index of refraction for Acrylic and Water.

Lecture Note 08 EECS 4101/5101 Instructor: Andy Mirzaian. All Nearest Neighbors: The Lifting Method

Lecture 5: Multilayer Perceptrons

Solving two-person zero-sum game by Matlab

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Virtual Machine Migration based on Trust Measurement of Computer Node

Analysis on the Workspace of Six-degrees-of-freedom Industrial Robot Based on AutoCAD

Smoothing Spline ANOVA for variable screening

Unsupervised Learning

PYTHON IMPLEMENTATION OF VISUAL SECRET SHARING SCHEMES

Using Radial Basis Functions to Solve Geodesics Equations for Body Measurements *

Analysis of Continuous Beams in General

Six-axis Robot Manipulator Numerical Control Programming and Motion Simulation

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

PHYSICS-ENHANCED L-SYSTEMS

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Interpolation of the Irregular Curve Network of Ship Hull Form Using Subdivision Surfaces

Distance based similarity measures of fuzzy sets

Classifier Selection Based on Data Complexity Measures *

Pose, Posture, Formation and Contortion in Kinematic Systems

AVO Modeling of Monochromatic Spherical Waves: Comparison to Band-Limited Waves

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Machine Learning 9. week

Computing the Minimum Directed Distances Between Convex Polyhedra

Kiran Joy, International Journal of Advanced Engineering Technology E-ISSN

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

APPLICATION OF AN AUGMENTED REALITY SYSTEM FOR DISASTER RELIEF

THE PULL-PUSH ALGORITHM REVISITED

Design of Structure Optimization with APDL

COMPARISON OF A MACHINE OF MEASUREMENT WITHOUT CONTACT AND A CMM (1) : OPTIMIZATION OF THE PROCESS OF METROLOGY.

Lecture #15 Lecture Notes

Monte Carlo 1: Integration

COMPLETE CALCULATION OF DISCONNECTION PROBABILITY IN PLANAR GRAPHS. G. Tsitsiashvili. IAM, FEB RAS, Vladivostok, Russia s:

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR

Transcription:

PDCS 013 (Ukrane, Kharkv, March 13-14, 013) Parallel Computaton of the Functons Constructed wth R-operatons usng CUDA Roman A. Uvarov Podgorny Insttute for Mechancal Engneerng Problems of NAS of Ukrane, /10 Dm. Pozharsky str., Kharkv, Ukrane uvarov@pmach.kharkov.ua Abstract. The developed software system "AutoOmega" for automaton of modelng varous physcal felds n twodmensonal domans of complex geometrcal shapes s represented. Automaton s acheved by solvng the drect and nverse problems of automatc translaton of geometrc nformaton nto analytcal and programmng form, supportng the meshless natural elements method wth R-functons method as well as usng CUDA technology for parallel computng the sngle analytcal functon descrbng the geometrcal object n two- and three-dmensonal domans. Keywords Parallel Computng, the Theory of R-functons, CUDA Programmng, Natural Elements Method. 1 Introducton The problem of adequate mplementaton of parallelsm s acute manly for the purpose of recevng the results of solvng computatonal problems n tme as less as possble. It s based on the fact that the task s dvded nto a set of smaller tasks that can be solved smultaneously. There are many dfferent types of solutons, both physcal and software, and one of them s CUDA technology featured n Nvda s GPUs of 8 seres and above. The am of ths work s to parallelze the computaton n the software package that mplementng the automaton process enhanced by R-functons for researchng the felds of dfferent physcal nature n two-dmensonal domans wth complex geometres usng the meshless method of natural elements (Natural Elements Method) for solvng boundary value problems. Theoretcal Part The mathematcal models of physcal and mechancal felds are the problems for partal dfferental equatons under certan boundary and ntal condtons. A specfc feature of such felds s ther dependence not only on the nature of physcal laws taken nto account by the relevant equatons, but also on the shape and the relatve poston of objects, n whch the felds are nvoked. The presence of two dfferent knds of nformaton (analytcal and geometrcal) n the formulaton of boundary value problems s the actual problem, whle creatng the methods and algorthms for ther solutons [1, ]. Dversty, complexty and unqueness of the forms of geometrc objects determned the relevance of the nverse problem of analytcal geometry, formulated by R. Descartes: gven a geometrcal object, ts equaton should be wrtten. The theory of R-functons developed by Academcan V. L. Rvachev [1] allows to solve ths problem. It makes possble to descrbe the geometrcal objects by functons whose values determne the boundary, nteror and geometrcal propertes of the object. Constructon of equatons of the geometrcal object s boundary requres the defnton of support functons and a logcal formula, whch allows to obtan the equaton analytcally wth an approprate choce of the R-operatons. In general, the followng set of R-operatons s used for constructng the functons ω -330-

PDCS 013 (Ukrane, Kharkv, March 13-14, 013) 1 x y = x + y x + y α αxy, 1+ α 1 x y = x + y + x + y α αxy, (1) 1+ α where α = α(, y), 1 < α 1 x. x = x, In addton, the property of functon s normalzaton may be mplemented: ω ν = ± 1, () ω = 0 where ν s a normal vector to Ω. At present tmes, the R-functons method (RFM) s actvely developed and appled. The method [1] for constructng the equatons of a complex geometrcal object s a good technology foundaton for the automaton of the composton process for these equatons. A unversal logcal formula descrbng a composte geometrcal form wth the R-operatons (1) and property of functon s normalzaton () s proposed [3]: N N + K ω = ( n ) ( ex α ω α α ω ), (3) = 1 = N + 1 where ω n s a functon of geometrcal object s boundary that descrbes ts nteror, whle ω ex s a functon that descrbes ts exteror. It s suffcent to ndcate knds of used standard geometrc objects and parameters that defne ther postons and szes..1 «AutoOmega» system The software system «AutoOmega» was developed wth a constructve apparatus of the theory of R-functons. Input nformaton n ths system ncludes knds of used standard geometrcal objects, geometrc parameters that determne ther postons and szes, rotaton angles, and the wdths of the rngs, n a case of a doubly connected doman. Support functons n form of normalzed equatons as well as predcate and analytcal functons of the composte object (3) are automatcally generated wth ths nformaton. Ths method has been called the standard prmtves method [3]. Prmtves n «AutoOmega» are dvded nto groups characterzed by a growng number of defnng parameters and nterface to set them. So, to set the functon of: a crcle, 3 parameters are needed (center coordnates and radus); an ellpse, 4 parameters are needed (center coordnates and half szes of axes); a rectangle, 4 parameters are needed (center coordnates and half szes of sdes); a trangle, a segment of crcle, and a sector of crcle, 6 parameters are needed (coordnates of three vertces); a convex quadrangle, 8 parameters are needed (coordnates of four vertces); a regular polygon, 4 parameters are needed (coordnates of crcle s center and ts radus as well as a number of vertces). Constructve apparatus of the theory of R-functons s used for constructon of each prmtve. Let s consder some of the prmtves. For example, the regular polygon. It wll be bult wth parameters of ts crcumcrcle and a number of polygon vertces. Then, durng the transton to polar coordnates we use the followng formulas ρ = ( x ) ( y y ) xrp + rp, -331-

PDCS 013 (Ukrane, Kharkv, March 13-14, 013) where y y rp arctan, when x x rp y y rp arctan + π, when x x rp θ = y y rp arctan + π, when x x rp y y rp arctan + π, when x x rp x rp, y rp are coordnates of regular polygon s crcumcenter. Let s ntroduce the parameter x > 0, y > 0 x < 0, y > 0 x < 0, y < 0 x > 0, y < 0 8 m = + 1 θ n ( 1) µ ( 1) sn π n = 1 ( 1) where n s a number of polygon s vertces, and m a a number of terms. Normalzed functon of a regular polygon s descrbed by: ω rp = ( ρ cos µ rrp ), (4) where r rp s a rados of polygon s crcumcrcle. Normalzed functon of the rng of a regular polygon s descrbed by: f1 = ( ρ cos µ rrp ), f = ( ρ cos µ ( rrp drp) ), where ωrrp = f 1 α f, (5) d rp s a wdth of the rng of a regular polygon. To rotate the angle ϕ rp of a regular polygon t s needed to replace varables x and y nvolved n the calculaton of the parameters used n equatons (4) - (5) wth varables x rrp and y rrp by the followng formula: x rrp = cosϕ rp ( x xrp ) sn ϕrp ( y yrp ) + xrp, y rrp = sn ϕ rp ( x xrp ) + cosϕrp ( y yrp ) + yrp. The dalog for the regular polygon (Fgure 1) allows the user to automatcally construct a regular polygon by gven coordnates of the center and the radus of the crcle. Fg.1. Dalog and pre-dalog for edtng the regular polygon -33-

PDCS 013 (Ukrane, Kharkv, March 13-14, 013) The default number of vertces of the polygon s set to three, but can be changed to a hgher number (lmted to 50, as a vsual polygon becomes a crcle) n the pre-dalog (Fgure 1). The translator ncluded n «AutoOmega» system allows to automatcally generate the lnes of code n problemorented language for a specfc scentfc system such as MATLAB or POLYE-RL based on the parameters, and to run t to executon for solvng a specfc boundary value problem. Translator dalog box s dvded nto two panels. The left pane shows the contents of the AO-fle n format of «AutoOmega» system, whch stores settngs of prmtves. The rght pane s flled wth the content of the automatcally generated fle n a problem-orented language. Parameters of prmtves are used for constructon of the support functons and the functons of a complex doman. The result s a pcture of the level lnes of the feld n the doman, bult n the «AutoOmega» system. Ths system was customzed for the problems of researchng the temperature felds on the board wth heat sources emttng the heat wth known ntensty [4], and problems of calculatng the metal products wth a known profle of complex shape, such as a ral (Fgure ), Larsen rabbet.. Natural Elements Method Fg.. Model of a ral constructed n «AutoOmega» and calculated n POLYE-RL The meshless method of natural elements (NEM) was selected as a method for solvng boundary value problems n «AutoOmega». It can be descrbed as a method that uses Sbson and non-sbsonan functons as nterpolants. The mesh s not requred for the nterpolants constructon. Intally, natural neghbor nterpolaton of the elements were ntroduced by Sbson for smoothng the scattered data[5]. It s based on the Vorono cells T defned as T = { x R : d( x, x ) < d( x, x j ) j }, where d ( x, x j ) s a dstance (a Eucldean norm) between x and Sbson functons or functons of natural neghbor elements are defned as rato of the polygonal areas of the Vorono dagram. Hence, A ( x) Φ ( ) x =, A( x) where A ( x) = Tx s a total area of the Vorono cell for x and A ( x) = T Tx s an area of partal coverage of - node by Vorono cell (Fgure 3). x j. -333-

PDCS 013 (Ukrane, Kharkv, March 13-14, 013) Fg.3. Constructon of Sbson functon Key dfferences that separate the NEM from the fnte elements method (FEM) le n the desgn and numercal mplementaton of shape functons, and n the methodology used for assemblng the stffness matrx. The algorthm proposed by Watson [6] for constructng the shape functon was mplemented. All other steps are common to both methods..3 CUDA applance CUDA (an acronym for Compute Unfed Devce Archtecture) for Nvda s graphcs processng unts of 8-seres or hgher was used for parallel computng n the «AutoOmega» system. The followng parameters were used: CPU Intel Core 7-3770K; GPU NVIDIA GeForce GTX 680 (8 Multprocessors x 19 CUDA Cores/MP = 1536 CUDA Cores); Mcrosoft Vsual Studo 010 Express Edton (C++); NVIDIA CUDA Toolkt v5.0. Workflow of the program wth CUDA had the followng specfcs: Copyng the data from the man memory to the memory of the GPU; CPU allows the GPU to perform the task; GPU executes n parallel on each of ts cores; Copyng the result from the GPU memory to the man memory. A smplest program sample to calculate the R-conjuncton of two real arguments usng CUDA s stated below. #nclude <ostream> devce float R_ANDem(float a, float b ) { float alpha = 0.5; return 1.0/(1+alpha)*(a+b-sqrt((a*a+b*b-*alpha*a*b)*1.0)); } global vod R_AND( float a, float b, float *c ) { *c = R_ANDem( a, b ); } nt man( vod ) { float c; nt *dev_c; HANDLE_ERROR( cudamalloc( (vod**)&dev_c, szeof(nt) ) ); R_AND <<<1,1>>> (, 7, dev_c ); -334-

PDCS 013 (Ukrane, Kharkv, March 13-14, 013) HANDLE_ERROR(cudaMemcpy( &c, dev_c, szeof(float), cudamemcpydevcetohost ) ); prntf( "Result = %5.f\n", c ); cudafree( dev_c ); return 0; } The calculaton of a sngle analytc functon, descrbng the geometrcal object and consstng of a number of support functons, n each pont of specfed area needs the parallelzaton. It s obvous that the efforts for computng the functon of a complex geometrcal object wll ncrease wth the number of the support functons f such a process s sequental. The dea of parallel computng n ths case s based on the fact that the problem of computng the functon of a complex geometrcal object s dvded nto a set of tasks for computng the support functons, whch can be solved smultaneously. In ths case, the dea was mplemented wth the Nvda s CUDA. For comparson purposes, calculatons of varous functons wth R-operatons and wthout them were carred out n the two-dmensonal doman (sutable for «AutoOmega» system) and n three-dmensonal doman (sutable for currently n development «AutoOmega 3D» system that s a successor of «AutoOmega» system s deas). Calculatons were performed usng both the CPU and GPU. The results are presented n Table 1. Tab.1. Total tme of calculaton of the functons n each pont of the threedmensonal doman of 4000 x 4000 x 40 ponts n sze, n seconds Functon CPU GPU Unbounded cylnder.44 0.01 R-conjuncton of scalar values (alpha = 0.5) 11.89 0.01 R-conjuncton of two cylnders 14.65 0.01 Prsm wth a convex quadrangle as a base 1.3 0.01 R-dsjuncton of two spheres 13.9 0.01 Parallelepped 14.5 0.01 3 Concluson In general, Nvda s CUDA technology s orented on solvng the dynamcal problems of graphcal and computatonal knds. In ths artcle ts applance to stage of computng the sngle analytc functon constructed based on the constructve apparatus of the theory of R-functons was analyzed. Obtaned results allow hghlght ts benefts and lmtatons. In general, the use of CUDA technology has sgnfcantly reduced the tme spent on computng. References [1] V. L. Rvachev Theory of R-functons and ts several applcatons. Kev: Nauk. dumka, 198. 55 p. [n Russan] [] V. L. Rvachev, A. P. Slesarenko Algebra of logc and ntegral transforms n boundary value problems Kev: Nauk. dumka, 1977. 87 p. [n Russan] [3] K. V. Maksmenko-Sheyko, A. M. Matsevty, A. V. Tolok, T. I. Sheyko Constructve apparatus of R-functons method for automaton of constructng the equatons of complex geometrcal objects. Bulletn of Zaporozhye State Unversty, : 66-76, 005. [n Russan] [4] K. V. Maksmenko-Sheyko, R. A. Uvarov, T. I. Sheyko Mathematcal and computer modellng of heat modes of radoelectroncs tools by R-functons method. Electronc modelng. 31(4): 79-87, 009. [n Russan] [5] R. Sbson A vector dentty for the Drchlet tessellaton. Math. Proc. Cambrdge Phlos. Soc. 87: 151-155, 1980. [6] N. Sukumar, B. Moran, T. Belytschko The Natural Element Method n Sold Mechancs. Int. J. Numer. Meth. Engng. 43: 839-887, 1998. -335-