6 Implementation of Parallel FE Systems

Size: px

Start display at page:

Download "6 Implementation of Parallel FE Systems"

Lily Samantha Cannon
5 years ago
Views:

1 6 Implementation of Parallel FE Systems 6.1 Implementation of Domain Decomposition in MSC.NASTRAN V Further Parallel Features of MSC.NASTRAN V Parallel Normal Modes Analysis Parallel Direct Frequency Response Analysis 6.3 Hints for Writing Your Own Parallel FEM Programs Components for FEM Subtasks Components for Parallel Solution of Linear Systems 6.4 Questions for Exams

2 6.1 Implementation of DD in MSC.NASTRAN! parallel linear static analysis in MSC.NASTRAN V70.7 will be based on domain decomposition ( parallel SOL101 )! submitted for example by nastran crank mem=20m dmp=4! parallel SOL 101 consists of the following major steps: all processors read the input file (data deck), for example crank.dat containing the complete FE model then all processors execute the automatic partitioning algorithm to create as many domains as there are processors available from then on the processors continue to work on their local domain only each processor performs local element matrix generation, element matrix assembly and constraint elimination all processors cooperate to compute a globally correct solution (for a direct solution of the linear system they have to cooperate in decomposition and FBS) each processor executes a local data recovery if requested by the submittal keyword mergeresults=yes, data is finally collected onto processor 1 ( master )

3 6.1 Implementation of DD in MSC.NASTRAN! parallel SOL 101 is a SPMD program, during its execution each processor executes the following major steps: IFP* start up SEQP EMG EMA UPARTN DCMP (decomposition of interior dofs, chapter 5) DISDCMP (decomposition of boundary dofs) FBS (forward pass on interior dofs) DISFBS (forward and backward pass on boundary dofs) FBS (backward pass on interior dofs) SDR* DISOFPM/S (collection of results) EXIT

4 6.1 Implementation of DD in MSC.NASTRAN! possible design alternative: for example, the model could be split a priori into domains by the preprocessor, so that each processor would be fed by a local input file containing only the local domain would make IFP parallel SEQP would be obsolete but: number of processors would be fixed, often queuing systems are used in industrial environments where user may want to specify a range of the desired number of processors (example: min. 4, max. 8)! example for results with parallel linear static analysis in MSC.NASTRAN (V70.7 development system) FEM description: GRIDs : 49,932 HEXAs : 39,448 PENTAs : 5,752 DOFs : 148,770 1 load case

5 6.1 Implementation of DD in MSC.NASTRAN! Elapsed times on IBM RS/6000 SP (MSC) with direct solution seconds proc. 2 proc. 4 proc. 8 proc.

6 6.1 Implementation of DD in MSC.NASTRAN! Speedups on 8 processors for selected modules with direct solution startup EMG EMA DCMP FBS (87.8) SDR total

7 6.1 Implementation of DD in MSC.NASTRAN! Maximum local disk space with direct solution 1500 megabytes proc. 2 proc. 4 proc. 8 proc.

8 6.1 Implementation of DD in MSC.NASTRAN! Accumulated maximum local disk space with direct solution 2000 megabytes proc. 2 proc. 4 proc. 8 proc.

9 6.1 Implementation of DD in MSC.NASTRAN! Elapsed times on SUN E6000 with iterative solution seconds proc. 2 proc. 4 proc. 8 proc.

10 6.1 Implementation of DD in MSC.NASTRAN! Optimal memory in SOLVIT for runs with iterative solution on SUN E6000 MB proc. 2 proc. 4 proc. 8 proc.! if there is time: view log files and f04 files for serial/parallel runs

11 6.1 Implementation of DD in MSC.NASTRAN! Parallel linear static analysis useful for large models, for example if local disk is insufficient! Very useful for MSC.CONSTRUCT! Postprocessing of created domains with MSC.PATRAN:

12 6.1 Implementation of DD in MSC.NASTRAN! V70.7 prototype successfully tested on IBM RS/6000 Model 590 workstation cluster with switched fast Ethernet! Dedicated talk about this project at MSC Worldwide Automotive Users Conference in September! results without data recovery (would further increase speed ups): minutes proc. 2 proc. 4 proc. FEM description: GRIDs : 225,885 QUAD4s : 205,848 TRIA3s : 12,955 RODs : 364 BARs : 252 ELAS1s : 516 RBE2s : 588 DOFs : 1,348,348 1 load case

13 6.2 Further Parallel Features of MSC.NASTRAN V70.7! V70.7 will be available in October 1999! Distributed parallel analysis types in MSC.NASTRAN V70.7 parallel linear static analysis parallel normal modes analysis parallel direct frequency response analysis! Based on MPI! Supported platforms (current plan) IBM SUN SGI HP NEC Fujitsu Compaq Intel Windows NT V70.7 supports parallel compute servers (IBM SP, HP V-class, etc.) will also work on selected workstation clusters (IBM, HP, SUN, SGI)

14 Processor 1 Processor Parallel Normal Modes Analysis! normal modes analysis = eigenfrequency analysis (SOL 103 in MSC.NASTRAN)! example: vibration of car bodies due to rotations of engine, bumps on roads, etc.! in linear static analysis: a linear system K u = f has to be solved, for example by a decomposition followed by a FBS! in normal modes analysis: generalized eigenvalue problem has to be solved: K x = λ M x K: stiffness matrix M: mass matrix x: eigenvector! several methods available, most efficient method in FEM today is the Lanczos algorithm and its variants (see literature)! typical task of normal modes analysis: determine all eigenvalues between 0 and 300 Hz! parallelization approach: frequency range is split into segments, each processor computes modes in its segment independently F1 F2

15 6.2.1 Parallel Normal Modes Analysis! frequency bounds can be user given or are computed by MSC.NASTRAN with a heuristic formula! Example: CASA satellite, normal modes analysis between 0 and 150 Hz, 209 modes! automatic frequency distribution on 4 processors: 47 EIGENVALUES FOUND IN DISTRIBUTED SEGMENT # 1 58 EIGENVALUES FOUND IN DISTRIBUTED SEGMENT # 2 49 EIGENVALUES FOUND IN DISTRIBUTED SEGMENT # 3 55 EIGENVALUES FOUND IN DISTRIBUTED SEGMENT # 4! elapsed times on IBM RS/6000 SP (66MHz, POWER2): proc. 2 proc. 4 proc. 8 proc. seconds FEM description: GRIDs : 12,283 QUAD4s : 5,999 TRIA3s : 9,044 ELAS1s : 10,231 BARs : 565 BEAMs : 213 CONM2s : 115 dofs : 65, modes

16 6.2.1 Parallel Normal Modes Analysis! further example: large acoustic analysis of car body With Courtesy of FEM description: (structure) GRIDs : 128,659 QUAD4s : 121,912 TRIA3s : 1,234 BARs : 104 CELAS2s : 1,410 CONM2s : 295 RBE2s : 102 RBE32 : 25 DOFs : 768, eigenmodes FEM description: (air) GRIDS : 7,898 HEXA8s : 3,964 PENTA6s : 1,352 TETRA4s : 5,618 DOFs : 7, eigenmodes 56 GB disk, 1.6 TB I/O transferred VOLVO 850 structure 140 frequency steps air

17 6.2.1 Parallel Normal Modes Analysis! Results with V70.0 (serial) CRAY J90 CRAY T90 HP V2250 / PA8200 / 240 MHz, 16 GB main memory, Ultra SCSI, 16 disks, 4 controllers hours CRAY J90 CRAY T90 V2250! Results with V70.7 (serial and parallel) HP V2250 / PA8200 / 240 MHz, 16 GB main memory, Ultra SCSI, 16 disks, 4 controllers HP V2500 / PA8500 / 440 MHz, 16 GB main memory, fibre channel array (10 disks, 2 controllers) HP N4000 / PA8500 / 360 MHz, 8 GB main memory, fibre channel array (10 disks, 2 controllers) hours serial 4 proc. V2250 V2500 N4000

18 6.2.2 Parallel Direct Frequency Response Analysis! Frequency response analysis computes the responses to oscillatory excitation! example: response of car components to excitations resulting from rotations of engine! in direct frequency response (SOL 108), the equation 2 [ ω M + iωb + K] u( ω) = P( ω) is solved directly for each frequency by DCMP followed by FBS! parallelization is straightforward: frequency range is split among processors, each processor computes responses to its local frequencies without interprocessor communication! SOL 108 also contains parallel data recovery: each processor does local data recovery on its local frequencies, at the end results are collected to master via the new DISOFPM/S modules or left local (dependent on mergeresults=yes/no setting)

19 6.2.2 Parallel Direct Frequency Response Analysis! In a parallel SOL 108 run, each processor performs the following steps: read full input deck build element matrices and assembled matrices for full model eliminate constraints determine local frequency segment compute responses to frequencies in local segment do local data recovery collect results on master or leave them local, in the latter case for example MSC.PATRAN can be used to view all local postprocessing files simultaneously (MSC.PATRAN picks automatically the results from the corresponding local results (=xdb) file)

20 6.2.2 Parallel Direct Frequency Response Analysis! Example: exhaust manifold! Elapsed times on IBM RS/6000 SP (POWER2, 66 MHz), 100 frequency steps minutes proc. 2 proc. 4 proc. 8 proc. FEM description: GRIDs : 10,800 QUADs : 6,305 TRIAs : 337 HEXAs : 1,899 PENTAs : 669 TETRAs : 21 DOFs : 49,309

21 6.2.2 Parallel Direct Frequency Response Analysis! Influence of parallel data recovery on 4 processor speedups (40 frequencies) 4 speedup ser. DR par. DR, 1 xdb par. DR, 4 xdb! XY Plot created from multiple xdb files with MSC.PATRAN

22 6.2.2 Parallel Direct Frequency Response Analysis! Car body, ~240,000 dofs, 100 frequencies! Elapsed times on SGI Origin 2000 (300 MHz R12000 processors, 8 GB memory) Similar picture minutes proc. 2 proc. 4 proc. 8 proc.

23 6.3 Hints for Writing Your Own Parallel FEM Programs Components for FEM Subtasks! major steps: input file reading: can be implemented using a finite state machine model partitioning: public domain software available, for example METIS (www-users.cs.umn.edu/~karypis/metis/metis/metis.shtml) (parallel) element matrix generation: see for example book by Schwarz (FORTRAN-Programme zur Methode der finiten Elemente) (parallel) element matrix assembly: easy to implement (parallel) constraint elimination: easy to implement parallel solution of linear systems: see (parallel) data recovery: easy to implement data collection: also easy to implement! the modular structure of MSC.NASTRAN is an excellent example how to write a (parallel) FEM program

24 6.3.2 Components for Parallel Solution of Linear Systems! PSPASES public domain parallel multifrontal solver requires input matrix and right hand side for example in Rutherford- Boeing format (see later) winter.cs.umn.edu/~mjoshi/pspases! PARASOL project funded by the European Union for developing and evaluating parallel sparse matrix solvers started in 1996, will end in August parallel solvers have been developed which will be put into the public domain PSL_PS by GMD, Germany: parallel iterative solver with multigrid preconditioning PSL_MUMPS by RAL, England, and CERFACS, France: parallel direct solver based on multifrontal method

25 6.3.2 Components for Parallel Solution of Linear Systems PSL_DDM by Parallab, Norway: iterative solver based on domain decomposition PSL_FETI by Onera, France: iterative solver based on domain decomposition! the PARASOL solvers are available as a library, all library routines can be called in own programs using the PARASOL library interface, some routines are: psl_init: initialize PARASOL, e.g. select solver psl_map: compute mapping of data to processors psl_solve: solve the linear system psl_end: end PARASOL! the PARASOL software distribution also offers a PARASOL test driver, which reads data in PARASOL file format and outputs the solution, sample call on IBM RS/6000 SP: ptd -um -y -d/tmp/sm/d_bmw3 bmw3_1.mtx bmw3_1.rhs bmw3_1.sln -mi=1,1 -procs 8 -euilib us -labelio yes

26 6.3.2 Components for Parallel Solution of Linear Systems! PARASOL file format is an application of the Rutherford-Boeing file format allows to store all data which a parallel solver might need in a standardized format (even geometry) example: for this project a new MSC.NASTRAN module has been developed, called PARASOL, which outputs the following data for each domain: <testcase>di.mtx: assembled and constrained local stiffness matrix (PSL_MATRIX) <testcase>di.rhs: assembled and constrained local right hand side (PSL_RHSIDE) <testcase>di.vat: variable types (PSL_VARTYPE) <testcase>di.fet: finite element types (PSL_CELLTYPE) <testcase>di.gcl: grid cell list (lists of grids of each element) (PSL_GRIDCELL) <testcase>di.gnd: grid node list (lists variables for each grid) (PSL_GRIDNODE)

6.3.2 Components for Parallel Solution of Linear Systems! PARASOL file format (cont d) <testcase>di.geo: coordinates of grids (PSL_NODEPOS) <testcase>di.

27 6.3.2 Components for Parallel Solution of Linear Systems! PARASOL file format (cont d) <testcase>di.geo: coordinates of grids (PSL_NODEPOS) <testcase>di.ivr: list of local interface variables in local numbering (PSL_INTERVAR) <testcase>di.v2g: mapping of all local variables, including interface variables, to global variables (PSL_VAR2GLOB) example: 2x2 cube, split into 2 domains domain 1 domain 2

28 6.3.2 Components for Parallel Solution of Linear Systems grid, element and variable numbers for domain 1 5 (7,8,9) 6 (10,11,12) 17 (43,44,45) 20 (52,53,54) (1,2,3) 14 (34,35,36) 10 (22,23,24) 16 (40,41,42) 18 (46,47,48) 11 (25,26,27) 19 (49,50,51) 21 (55,56,57) 12 (28,29,30) 13 (31,32,33) 15 (37,38,39) y 3 1 z x 7 (13,14,15) 2 (-) 8 (16,17,18) 3 (1,2,3) 9 (19,20,21) 1 (-)

29 6.3.2 Components for Parallel Solution of Linear Systems mtx file for domain 1 stiffness matrix rsa (3I14) (3I14) (1P,3E25.16E3) BC2-_1D E E E E E E E E E E E E E E E

30 6.3.2 Components for Parallel Solution of Linear Systems rhs file for domain 1 right hand side(s) rhsrd r (1P,1E25.16E3) E E E E E E E E E E E E E BC2-_1D1

31 6.3.2 Components for Parallel Solution of Linear Systems vat file for domain 1 variable types avl i 57 1 (1I4) BC2-_1D1

32 6.3.2 Components for Parallel Solution of Linear Systems fet file for domain 1 finite element types avl i 4 1 (1I4) BC2-_1D1

33 6.3.2 Components for Parallel Solution of Linear Systems gcl file for domain 1 finite element list icvs p (6I10) (6I10) BC2-_1D1

34 6.3.2 Components for Parallel Solution of Linear Systems gnd file for domain 1 node list (number of variables per node) ipts p (1I10) (1I10) BC2-_1D1

35 6.3.2 Components for Parallel Solution of Linear Systems geo file for domain 1 node coordinates geo s r 21 3 (1E25.16) E E E E E E E E E E E E E E E E E E E E E E E E E BC2-_1D1

36 6.3.2 Components for Parallel Solution of Linear Systems ivr file for domain 1 interface variables (PSL_INTERVAR) icvs p (1I10) (1I10) BC2-_1D1

37 6.3.2 Components for Parallel Solution of Linear Systems v2g file for domain 1 mapping of local to global indices (PSL_VAR2GLOB) ipts p (1I10) (1I10) BC2-_1D1

38 6.4 Questions for Exams! describe the 7 steps of the (serial) FEM! comparison direct iterative solution of linear systems! classification of parallel computer architectures! describe 4 types of interconnection networks used in commercial parallel computers today! parallel programming: shared memory distributed memory! FEM parallelization approaches! describe the 7+ steps of parallel FEM based on domain decomposition! brief description of multifrontal method and its parallelization

Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering

State of the art distributed parallel computational techniques in industrial finite element analysis Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering Ajaccio, France