International Conference on Computational Science (ICCS 2017)

Size: px

Start display at page:

Download "International Conference on Computational Science (ICCS 2017)"

Ellen Reynolds
5 years ago
Views:

1 International Conference on Computational Science (ICCS 2017) Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations G. Bernabé, J. C. Cano, J. Cuenca, A. Flores, D. Giménez, M. Saura-Sánchez Ω and P. Segado-Cabezos Ω Computer Engineering Department, University of Murcia Computer Science and Systems Department, University of Murcia Ω Mechanical Engineering, Technical University of Cartagena June, 2017 Conference title 1

2 Outline Introduction and Motivation Parallelism in the Structural Groups method Results Conclusions and Future work ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 2

3 Outline Introduction and Motivation Parallelism in the Structural Groups method Results Conclusions and Future work ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 3

4 Introduction Multibody systems (MBS): mechanical systems formed by rigid and flexible bodies which are connected by means of mechanical joins in such a way that there is relative movement between their bodies terminal handles The Stewart Platform platform ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 4

5 Introduction The study of the relationships between the bodies is known as kinematic modeling Selects a vector q of coordinates to define the position and orientation of each body of the MBS in the space Coordinates are related by means a nonlinear systems of constrainst equations Φ(q) = 0 ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 5

6 Introduction The study of the relationships between the bodies is known as kinematic modeling Selects a vector q of coordinates to define the position and orientation of each body of the MBS in the space Coordinates are related by means a nonlinear systems of constrainst equations Φ(q) = 0 Global formulations Topogical formulations ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 6

Introduction The study of the relationships between the bodies is known as kinematic modeling Selects a vector q of coordinates to define the position and orientation of each body of the MBS in the

7 Introduction The study of the relationships between the bodies is known as kinematic modeling Selects a vector q of coordinates to define the position and orientation of each body of the MBS in the space Coordinates are related by means a nonlinear systems of constrainst equations Φ(q) = 0 Global formulations Topogical formulations exploits the topology of the MBS to reduce the dimension of the problem by relating the position of each body with respect to its preceding one ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 7

8 Introduction Structural Analysis: splits the MBS into Structural Groups (SGs) Kinematic Structure: How many SG, kind & order terminal (SG-T0) (8) 12 dependent coordinates The Stewart Platform handle-stick (SG-H) (2-7) 15 dependent coordinates ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 9

9 Introduction Structural Analysis: splits the MBS into Structural Groups (SGs) Kinematic Structure: How many SG, kind & order terminal (SG-T) (8) handle-stick (SG-H) (2-7) ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 10

10 Motivation 1. A simulator for the computational kinematic analysis of MBS to allow us to analyze the efficiency of the group equations 2. A better exploitation of the computer resources by applying parallelism to reduce the executions in real-time applications terminal (SG-T) (8) handle-stick (SG-H) (2-7) ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 11

11 Outline Introduction and Motivation Parallelism in the Structural Groups method Results Conclusions and Future work ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 12

12 Outline Introduction and Motivation Parallelism in the Structural Groups method Results Conclusions and Future work ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 13

13 Parallelism in the Structural Groups method The Stewart Platform (MBS) is a case study to analyze the application of parallelism for speeding up the kinematic analysis based on Group equations terminal (SG-T) (8) handle-stick (SG-H) (2-7) ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 14

14 Parallelism in the Structural Groups method A scheme of the Group Equations method 1 for number of external iterations (tend*dt) do 2 Solve kinematic of terminal (size nsg-t) //MKL p. 3 for all structural components (nsg) do //OpenMP p. 4 for number of internal iterations (tend2) do 5 Solve kinematic of SC (size nsg-hs) //MKL p. 6 end for 7 end for 8 end for ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 15

15 Parallelism in the Structural Groups method A scheme of the Group Equations method 1 for number of external iterations (tend*dt) do 2 Solve kinematic of terminal (size nsg-t) //MKL p. 3 for all structural components (nsg) do //OpenMP p. 4 for number of internal iterations (tend2) do 5 Solve kinematic of SC (size nsg-hs) //MKL p. 6 end for 7 end for 8 end for tend: a maximum execution time is established dt: time step tend2: number of iterations for the position problem nsg: number of structural groups nsg-t: dimension of the SG-T matrix nsg-hs: dimension of the SG-HS matrix ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 16

16 Parallelism in the Structural Groups method A scheme of the Group Equations method 1 for number of external iterations (tend*dt) do 2 Solve kinematic of terminal (size nsg-t) //MKL p. 3 for all structural components (nsg) do //OpenMP p. 4 for number of internal iterations (tend2) do 5 Solve kinematic of SC (size nsg-hs) //MKL p. 6 end for 7 end for 8 end for Parallelism can be exploited by simultaneously solving the problems for the SGs in the system, inside a multicore system (MKL) or with calls to GPU (MAGMA) tend: a maximum execution time is established dt: time step tend2: number of iterations for the position problem nsg: number of structural groups nsg-t: dimension of the SG-T matrix nsg-hs: dimension of the SG-HS matrix ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 17

17 Parallelism in the Structural Groups method A scheme of the Group Equations method 1 for number of external iterations (tend*dt) do 2 Solve kinematic of terminal (size nsg-t) //MKL p. 3 for all structural components (nsg) do //OpenMP p. 4 for number of internal iterations (tend2) do 5 Solve kinematic of SC (size nsg-hs) //MKL p. 6 end for 7 end for 8 end for We have exploited the parallelism in different ways: 1. GEMKL: The multithreading version of MKL. 2. GEOMP+MKL: OpenMP is used to start the threads which works simultaneously in the solution of different SGs. The matrix problems for each group are solved by calling MKL, which can be sequential or multithreading 3. GEOMP+MA27: OpenMP parallelism is exploited, with calls to the routine MA27 for solution of the matrix problem 4. GEMAGMA: GPU parallelism is exploited by solving the matrix problems with MAGMA. ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 18

18 Outline Introduction Parallelism in the Structural Groups method Results Conclusions and Future work ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 19

19 Outline Introduction Parallelism in the Structural Groups method Results Conclusions and Future work ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 20

20 Results CPU Intel Core i GHz 4 cores No Hyper-Threading 16 GB RAM MKL GMKL and MA27 GMA27 (dense and sparse solvers) is used for the Global formulation GEOMP+MKL and GEOMP+MA27 is used for the Group equations (Topological formulation) ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 21

21 Results Global formulations vs Group Equations Total size nsg-hs GMKL GMA27 GEOMP+MKL GEOMP+MA27 time th. time time th. x th. time th x x x x x x x x Experiments: number of groups of the SP (nsg=6), nsg-t=12 and nsg-hs=15 for the smallest problem, and nsg-t is fixed to 24 and nsg-hs= for the other problems. Total size represents the size of the matrices for the global formulation. tend=200, dt=0.01, iterations, tend2=1 for the parallel algorithm for the G. E. ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 22

22 Results Global formulations vs Group Equations Total size nsg-hs GMKL GMA27 GEOMP+MKL GEOMP+MA27 time th. time time th. x th. time th x x x x x x x x Multithreading MKL is preferable to MA27 for small sizes not sparsity ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 23

23 Results Global formulations vs Group Equations Total size nsg-hs GMKL GMA27 GEOMP+MKL GEOMP+MA27 time th. time time th. x th. time th x x x x x x x x The exploitation of sparsity through MA27 is advisable for large matrices low complexity cost of MA27 (O(n)) vs MKL (O(n 3 )) The best results are obtained with 3 OpenMP threads ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 24

Spped-up Results Global formulations vs Group Equations 9 GMA27/GMKL GMA27/GEOMP+MKL GMA27/GEOMP+MA27 8 7 6 5 4 3 2 1 0 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 total size Speed-ups of

24 Spped-up Results Global formulations vs Group Equations 9 GMA27/GMKL GMA27/GEOMP+MKL GMA27/GEOMP+MA total size Speed-ups of parallel versions of GE in relation with the GF MA27 The GE method clearly outperforms the GF Speed-ups up to 8 for small sizes and closed to 4 for the largest problems ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 25

25 Results Application of parallelism for MBS larger than our case study CPU Intel Xeon E GHz 2 hexa-cores = 12 cores 32 GB RAM MKL and MA27 sequential (dense and sparse solvers) for the Group equations GEOMP+MKL and GEOMP+MA27 is used for the Group equations nsg = 6, 16, 22 nsg-hs = 15, 30, 60, 120 nsg-t = 12 (nsg-hs=15) and 24 in other cases ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 26

26 Results Application of parallelism for MBS larger than our case study nsg nsg-hs MKL MA27 GEOMP+MKL GEOMP+MA ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 27

27 Results Application of parallelism for MBS larger than our case study nsg nsg-hs MKL MA27 GEOMP+MKL GEOMP+MA The improvement increases with the number of groups and the number of coordinates. The exploitation of the sparsity is advantageous from between 60 and 120 coordinates. Two levels of parallelism can be used for a better exploitation of the parallelism in larger multicore systems. ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 28

Results Raspberry Pi versus Multicore Systems Raspberry Pi: Small and cheap systems with low energy consumption The Stewart Platform Raspberry Pi 2 Model B (RP2) 4 cores ARMv7 32 bits Raspberry Pi 3

28 Results Raspberry Pi versus Multicore Systems Raspberry Pi: Small and cheap systems with low energy consumption The Stewart Platform Raspberry Pi 2 Model B (RP2) 4 cores ARMv7 32 bits Raspberry Pi 3 Model B (RP3) 4 cores ARMv8 64 bits MKL is not available LAPACK without multithreading is used for LAR CPU Intel Core i GHz (MKL) 4 cores No Hyper-Threading 16 GB RAM CPU Intel Xeon E GHz (MKL) 2 hexa-cores = 12 cores 32 GB RAM ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 29

29 Results Raspberry Pi versus Multicore Systems Execution Time Energy consumption nsg-t nsg-hs RP2 RP3 i5 E5 RP2 RP3 i5 E Number of groups: nsg=6 tend=300 tend2=200 ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 30

30 Results Raspberry Pi versus Multicore Systems Execution Time Energy consumption nsg-t nsg-hs RP2 RP3 i5 E5 RP2 RP3 i5 E TDP RP2=RP3=4W TDP i5=15w TDP E5=95W ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 33

31 Results Raspberry Pi versus Multicore Systems Execution Time Energy consumption nsg-t nsg-hs RP2 RP3 i5 E5 RP2 RP3 i5 E Lowest TDP i5: Much low power consumption, slower RP: lowest TDP for the smallest size (SP) Competitive GP for control problems Low power consumption, price and size ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 34

32 Results Experiments on GPU CPU Intel Xeon E GHz (MKL) 2 hexa-cores = 12 cores 32 GB RAM GPU GTX950 (MAGMA) ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 35

33 Results Experiments on GPU Xeon OpenMPxMKL GPU Speed-up Speed-up nsg-t nsg-hs 6x2 2x6 MAGMA 6x2/2x6 6x2/MAGMA GPU would be advantageous only for large problems Not advisable: low execution or power consumption. ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 36

34 Outline Introduction Parallelism in the Structural Groups method Results Conclusions and Future work ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 37

35 Outline Introduction Parallelism in the Structural Groups method Results Conclusions and Future work ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 38

36 Conclusions The computational kimematic formulation based on group equations is a topological approach that exploits the kinematic structure of a MBS to divide it in several SGs of smaller sizes Parallel programming techniques can be applied to solve the equations independently The SP has been used to analyze the Group Equations formulation Lower execution times are obtained with the GE method in comparison with the global formulation Speed-ups achieved is between 4 and 8 Raspberry Pi: a good alternative to general purpose multicores for small control problems, with similar times and lower price, power consumption and space The massive parallelism of GPUs is appropriate for large problems ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 39

37 Future work The use of other computational libraries (dense and sparse) Auto-tuning techniques should be included in the routines the best parallel strategy and library with the values of some parameters (number of threads, number of steps) For large MBS the use of message-passing parallelism needs to be analyzed. ICCS 17 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations 40

38 International Conference on Computational Science (ICCS 2017) Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations G. Bernabé, J. C. Cano, J. Cuenca, A. Flores, D. Giménez, M. Saura-Sánchez Ω and P. Segado-Cabezos Ω Computer Engineering Department, University of Murcia Computer Science and Systems Department, University of Murcia Ω Mechanical Engineering, Technical University of Cartagena June, 2017 Conference title 41

Improving Linear Algebra Computation on NUMA platforms through auto-tuned tuned nested parallelism

Improving Linear Algebra Computation on NUMA platforms through auto-tuned tuned nested parallelism Javier Cuenca, Luis P. García, Domingo Giménez Parallel Computing Group University of Murcia, SPAIN parallelum