PDCS 013 (Ukrane, Kharkv, March 13-14, 013) Parallel Computaton of the Functons Constructed wth R-operatons usng CUDA Roman A. Uvarov Podgorny Insttute for Mechancal Engneerng Problems of NAS of Ukrane, /10 Dm. Pozharsky str., Kharkv, Ukrane uvarov@pmach.kharkov.ua Abstract. The developed software system "AutoOmega" for automaton of modelng varous physcal felds n twodmensonal domans of complex geometrcal shapes s represented. Automaton s acheved by solvng the drect and nverse problems of automatc translaton of geometrc nformaton nto analytcal and programmng form, supportng the meshless natural elements method wth R-functons method as well as usng CUDA technology for parallel computng the sngle analytcal functon descrbng the geometrcal object n two- and three-dmensonal domans. Keywords Parallel Computng, the Theory of R-functons, CUDA Programmng, Natural Elements Method. 1 Introducton The problem of adequate mplementaton of parallelsm s acute manly for the purpose of recevng the results of solvng computatonal problems n tme as less as possble. It s based on the fact that the task s dvded nto a set of smaller tasks that can be solved smultaneously. There are many dfferent types of solutons, both physcal and software, and one of them s CUDA technology featured n Nvda s GPUs of 8 seres and above. The am of ths work s to parallelze the computaton n the software package that mplementng the automaton process enhanced by R-functons for researchng the felds of dfferent physcal nature n two-dmensonal domans wth complex geometres usng the meshless method of natural elements (Natural Elements Method) for solvng boundary value problems. Theoretcal Part The mathematcal models of physcal and mechancal felds are the problems for partal dfferental equatons under certan boundary and ntal condtons. A specfc feature of such felds s ther dependence not only on the nature of physcal laws taken nto account by the relevant equatons, but also on the shape and the relatve poston of objects, n whch the felds are nvoked. The presence of two dfferent knds of nformaton (analytcal and geometrcal) n the formulaton of boundary value problems s the actual problem, whle creatng the methods and algorthms for ther solutons [1, ]. Dversty, complexty and unqueness of the forms of geometrc objects determned the relevance of the nverse problem of analytcal geometry, formulated by R. Descartes: gven a geometrcal object, ts equaton should be wrtten. The theory of R-functons developed by Academcan V. L. Rvachev [1] allows to solve ths problem. It makes possble to descrbe the geometrcal objects by functons whose values determne the boundary, nteror and geometrcal propertes of the object. Constructon of equatons of the geometrcal object s boundary requres the defnton of support functons and a logcal formula, whch allows to obtan the equaton analytcally wth an approprate choce of the R-operatons. In general, the followng set of R-operatons s used for constructng the functons ω -330-
PDCS 013 (Ukrane, Kharkv, March 13-14, 013) 1 x y = x + y x + y α αxy, 1+ α 1 x y = x + y + x + y α αxy, (1) 1+ α where α = α(, y), 1 < α 1 x. x = x, In addton, the property of functon s normalzaton may be mplemented: ω ν = ± 1, () ω = 0 where ν s a normal vector to Ω. At present tmes, the R-functons method (RFM) s actvely developed and appled. The method [1] for constructng the equatons of a complex geometrcal object s a good technology foundaton for the automaton of the composton process for these equatons. A unversal logcal formula descrbng a composte geometrcal form wth the R-operatons (1) and property of functon s normalzaton () s proposed [3]: N N + K ω = ( n ) ( ex α ω α α ω ), (3) = 1 = N + 1 where ω n s a functon of geometrcal object s boundary that descrbes ts nteror, whle ω ex s a functon that descrbes ts exteror. It s suffcent to ndcate knds of used standard geometrc objects and parameters that defne ther postons and szes..1 «AutoOmega» system The software system «AutoOmega» was developed wth a constructve apparatus of the theory of R-functons. Input nformaton n ths system ncludes knds of used standard geometrcal objects, geometrc parameters that determne ther postons and szes, rotaton angles, and the wdths of the rngs, n a case of a doubly connected doman. Support functons n form of normalzed equatons as well as predcate and analytcal functons of the composte object (3) are automatcally generated wth ths nformaton. Ths method has been called the standard prmtves method [3]. Prmtves n «AutoOmega» are dvded nto groups characterzed by a growng number of defnng parameters and nterface to set them. So, to set the functon of: a crcle, 3 parameters are needed (center coordnates and radus); an ellpse, 4 parameters are needed (center coordnates and half szes of axes); a rectangle, 4 parameters are needed (center coordnates and half szes of sdes); a trangle, a segment of crcle, and a sector of crcle, 6 parameters are needed (coordnates of three vertces); a convex quadrangle, 8 parameters are needed (coordnates of four vertces); a regular polygon, 4 parameters are needed (coordnates of crcle s center and ts radus as well as a number of vertces). Constructve apparatus of the theory of R-functons s used for constructon of each prmtve. Let s consder some of the prmtves. For example, the regular polygon. It wll be bult wth parameters of ts crcumcrcle and a number of polygon vertces. Then, durng the transton to polar coordnates we use the followng formulas ρ = ( x ) ( y y ) xrp + rp, -331-
PDCS 013 (Ukrane, Kharkv, March 13-14, 013) where y y rp arctan, when x x rp y y rp arctan + π, when x x rp θ = y y rp arctan + π, when x x rp y y rp arctan + π, when x x rp x rp, y rp are coordnates of regular polygon s crcumcenter. Let s ntroduce the parameter x > 0, y > 0 x < 0, y > 0 x < 0, y < 0 x > 0, y < 0 8 m = + 1 θ n ( 1) µ ( 1) sn π n = 1 ( 1) where n s a number of polygon s vertces, and m a a number of terms. Normalzed functon of a regular polygon s descrbed by: ω rp = ( ρ cos µ rrp ), (4) where r rp s a rados of polygon s crcumcrcle. Normalzed functon of the rng of a regular polygon s descrbed by: f1 = ( ρ cos µ rrp ), f = ( ρ cos µ ( rrp drp) ), where ωrrp = f 1 α f, (5) d rp s a wdth of the rng of a regular polygon. To rotate the angle ϕ rp of a regular polygon t s needed to replace varables x and y nvolved n the calculaton of the parameters used n equatons (4) - (5) wth varables x rrp and y rrp by the followng formula: x rrp = cosϕ rp ( x xrp ) sn ϕrp ( y yrp ) + xrp, y rrp = sn ϕ rp ( x xrp ) + cosϕrp ( y yrp ) + yrp. The dalog for the regular polygon (Fgure 1) allows the user to automatcally construct a regular polygon by gven coordnates of the center and the radus of the crcle. Fg.1. Dalog and pre-dalog for edtng the regular polygon -33-
PDCS 013 (Ukrane, Kharkv, March 13-14, 013) The default number of vertces of the polygon s set to three, but can be changed to a hgher number (lmted to 50, as a vsual polygon becomes a crcle) n the pre-dalog (Fgure 1). The translator ncluded n «AutoOmega» system allows to automatcally generate the lnes of code n problemorented language for a specfc scentfc system such as MATLAB or POLYE-RL based on the parameters, and to run t to executon for solvng a specfc boundary value problem. Translator dalog box s dvded nto two panels. The left pane shows the contents of the AO-fle n format of «AutoOmega» system, whch stores settngs of prmtves. The rght pane s flled wth the content of the automatcally generated fle n a problem-orented language. Parameters of prmtves are used for constructon of the support functons and the functons of a complex doman. The result s a pcture of the level lnes of the feld n the doman, bult n the «AutoOmega» system. Ths system was customzed for the problems of researchng the temperature felds on the board wth heat sources emttng the heat wth known ntensty [4], and problems of calculatng the metal products wth a known profle of complex shape, such as a ral (Fgure ), Larsen rabbet.. Natural Elements Method Fg.. Model of a ral constructed n «AutoOmega» and calculated n POLYE-RL The meshless method of natural elements (NEM) was selected as a method for solvng boundary value problems n «AutoOmega». It can be descrbed as a method that uses Sbson and non-sbsonan functons as nterpolants. The mesh s not requred for the nterpolants constructon. Intally, natural neghbor nterpolaton of the elements were ntroduced by Sbson for smoothng the scattered data[5]. It s based on the Vorono cells T defned as T = { x R : d( x, x ) < d( x, x j ) j }, where d ( x, x j ) s a dstance (a Eucldean norm) between x and Sbson functons or functons of natural neghbor elements are defned as rato of the polygonal areas of the Vorono dagram. Hence, A ( x) Φ ( ) x =, A( x) where A ( x) = Tx s a total area of the Vorono cell for x and A ( x) = T Tx s an area of partal coverage of - node by Vorono cell (Fgure 3). x j. -333-
PDCS 013 (Ukrane, Kharkv, March 13-14, 013) Fg.3. Constructon of Sbson functon Key dfferences that separate the NEM from the fnte elements method (FEM) le n the desgn and numercal mplementaton of shape functons, and n the methodology used for assemblng the stffness matrx. The algorthm proposed by Watson [6] for constructng the shape functon was mplemented. All other steps are common to both methods..3 CUDA applance CUDA (an acronym for Compute Unfed Devce Archtecture) for Nvda s graphcs processng unts of 8-seres or hgher was used for parallel computng n the «AutoOmega» system. The followng parameters were used: CPU Intel Core 7-3770K; GPU NVIDIA GeForce GTX 680 (8 Multprocessors x 19 CUDA Cores/MP = 1536 CUDA Cores); Mcrosoft Vsual Studo 010 Express Edton (C++); NVIDIA CUDA Toolkt v5.0. Workflow of the program wth CUDA had the followng specfcs: Copyng the data from the man memory to the memory of the GPU; CPU allows the GPU to perform the task; GPU executes n parallel on each of ts cores; Copyng the result from the GPU memory to the man memory. A smplest program sample to calculate the R-conjuncton of two real arguments usng CUDA s stated below. #nclude <ostream> devce float R_ANDem(float a, float b ) { float alpha = 0.5; return 1.0/(1+alpha)*(a+b-sqrt((a*a+b*b-*alpha*a*b)*1.0)); } global vod R_AND( float a, float b, float *c ) { *c = R_ANDem( a, b ); } nt man( vod ) { float c; nt *dev_c; HANDLE_ERROR( cudamalloc( (vod**)&dev_c, szeof(nt) ) ); R_AND <<<1,1>>> (, 7, dev_c ); -334-
PDCS 013 (Ukrane, Kharkv, March 13-14, 013) HANDLE_ERROR(cudaMemcpy( &c, dev_c, szeof(float), cudamemcpydevcetohost ) ); prntf( "Result = %5.f\n", c ); cudafree( dev_c ); return 0; } The calculaton of a sngle analytc functon, descrbng the geometrcal object and consstng of a number of support functons, n each pont of specfed area needs the parallelzaton. It s obvous that the efforts for computng the functon of a complex geometrcal object wll ncrease wth the number of the support functons f such a process s sequental. The dea of parallel computng n ths case s based on the fact that the problem of computng the functon of a complex geometrcal object s dvded nto a set of tasks for computng the support functons, whch can be solved smultaneously. In ths case, the dea was mplemented wth the Nvda s CUDA. For comparson purposes, calculatons of varous functons wth R-operatons and wthout them were carred out n the two-dmensonal doman (sutable for «AutoOmega» system) and n three-dmensonal doman (sutable for currently n development «AutoOmega 3D» system that s a successor of «AutoOmega» system s deas). Calculatons were performed usng both the CPU and GPU. The results are presented n Table 1. Tab.1. Total tme of calculaton of the functons n each pont of the threedmensonal doman of 4000 x 4000 x 40 ponts n sze, n seconds Functon CPU GPU Unbounded cylnder.44 0.01 R-conjuncton of scalar values (alpha = 0.5) 11.89 0.01 R-conjuncton of two cylnders 14.65 0.01 Prsm wth a convex quadrangle as a base 1.3 0.01 R-dsjuncton of two spheres 13.9 0.01 Parallelepped 14.5 0.01 3 Concluson In general, Nvda s CUDA technology s orented on solvng the dynamcal problems of graphcal and computatonal knds. In ths artcle ts applance to stage of computng the sngle analytc functon constructed based on the constructve apparatus of the theory of R-functons was analyzed. Obtaned results allow hghlght ts benefts and lmtatons. In general, the use of CUDA technology has sgnfcantly reduced the tme spent on computng. References [1] V. L. Rvachev Theory of R-functons and ts several applcatons. Kev: Nauk. dumka, 198. 55 p. [n Russan] [] V. L. Rvachev, A. P. Slesarenko Algebra of logc and ntegral transforms n boundary value problems Kev: Nauk. dumka, 1977. 87 p. [n Russan] [3] K. V. Maksmenko-Sheyko, A. M. Matsevty, A. V. Tolok, T. I. Sheyko Constructve apparatus of R-functons method for automaton of constructng the equatons of complex geometrcal objects. Bulletn of Zaporozhye State Unversty, : 66-76, 005. [n Russan] [4] K. V. Maksmenko-Sheyko, R. A. Uvarov, T. I. Sheyko Mathematcal and computer modellng of heat modes of radoelectroncs tools by R-functons method. Electronc modelng. 31(4): 79-87, 009. [n Russan] [5] R. Sbson A vector dentty for the Drchlet tessellaton. Math. Proc. Cambrdge Phlos. Soc. 87: 151-155, 1980. [6] N. Sukumar, B. Moran, T. Belytschko The Natural Element Method n Sold Mechancs. Int. J. Numer. Meth. Engng. 43: 839-887, 1998. -335-