"Fast, High-Fidelity, Multi-Spacecraft Trajectory Simulation for Space Catalogue Applications"

Size: px

Start display at page:

Download ""Fast, High-Fidelity, Multi-Spacecraft Trajectory Simulation for Space Catalogue Applications""

Avice McDaniel
6 years ago
Views:

P. Russell Assistant Professor Nitin Arora Ph.D.

1 "Fast, High-Fidelity, Multi-Spacecraft Trajectory Simulation for Space Catalogue Applications" Ryan P. Russell Assistant Professor Nitin Arora Ph.D. Candidate School of Aerospace Engineering Georgia Institute of Technology US-China Space Surveillance Technical Interchange, Beijing, China, Oct 2011

2 Motivations Space Debris: 2009 Iridium/Cosmos collision Currently track ~15K objects Next Generation sensors will track ~100K objects Need faster/better state and uncertainty prediction Covariance Realism No. Objects Tracked Russell R.P.

3 Motivation As near space environment is getting more crowded, the task of accurately tracking and cataloging growing number of objects is becoming more demanding and requiring high fidelity spacecraft trajectory simulations. High fidelity trajectory computation is slow. The problem is compounded when we are tracking large number of objects in space ( ~ 20K). Classic tradeoff between speed and accuracy Fast semi-analytic techniques (SGP4) high fidelity Special Perturbations (SP) In order to sufficiently achieve both in the context of real-time tracking on the order of 100K or more resident space objects, a paradigm shift is necessary to make the problem tractable. 3

4 Aim To bring together the innovations in: fast force model computation AND Single computer parallel programming (i.e. GPU) to achieve BOTH Speed accuracy Multiple orders of magnitude in speedup is sought Maintain approximate accuracy of that in SP Fast ephemeris Parallel Integration Future models Fast gravity model 4

5 Approach We propose a high fidelity spacecraft integration tool that takes advantage of A new (CPU based) fast and accurate perturbation models for high fidelity gravity accelerations and ephemeris Graphics Processing Unit (GPU) based runge-kutta solver to exploit the massive parallelism across multiple spacecraft the expected speedups are multiplicative (100x100=10000). The advantage of GPU based parallelism lies in its single user capability without the need of expensive computer clusters or semi analytic models (loss of accuracy). 5

6 Fast Force Models Fast and accurate Luni-solar ephemeris and Earth orientation: FIRE Reduction in computational time (multiple orders of magnitude improvement!). Provides continuous and analytic first and second derivatives of states and orientation matrices. User friendly (requires no more expertise than using JPL s SPICE). High flexibility and portability. Fast, efficient and Geopotential Computation: FETCH 3D interpolation based global gravity model, trades memory for speed. Non-Singular and Continuous to any order Accurate to the error in SH series. Scalable to any order /degree Extremely user friendly 6

7 FIRE for Fast Luni-Solar and Earth Orientation Ephemerides Russell, R. P., Arora, N., Arora, N., Russell, R. P., A Fast, Accurate, and Smooth Planetary Ephemeris Retrieval System, Celestial Mechanics and Dynamical Astronomy, Vol. 108, No. 2, 2010, pp , DOI /s NEED: position, velocity of Moon and Sun; and Earth Orientation Motivation: JPL SPICE Ephemeris is ssssllloooowww.. (software can spend more than half its time getting ephemeris data) 2 orders of magnitude reduction in call time for ephemeris calls for body and orientation calls to SPICE Custom built for problems that favor higher speeds and smooth derivatives (continuous and analytic first and second derivatives) fold improvement for trajectory propagation speeds (good for monte-carlos, etc.) CURRENT Other load Ephemeris load PREFERRED Ephemeris load Other load 7 Ryan P. Russell

Fast Geopotential Computations High fidelity geopotentials are expensive to compute spherical harmonics (SH) is conventional approach a 200x200 SH field is ~40,000 terms has spatial resolution ~ (100

8 Fast Geopotential Computations High fidelity geopotentials are expensive to compute spherical harmonics (SH) is conventional approach a 200x200 SH field is ~40,000 terms has spatial resolution ~ (100 km x 100 km) Recursive formulation Fast for single processor Not amenable to parallel computing SLOW Bottleneck in many applications: orbit estimation trajectory optimization mission design PRACTICE: Fields are truncated Engineers live with errors (blissfully unaware in most cases) Want: Fast Continuous and smooth across global domain Derivatives are continuous to at least 3 orders across global domain ( 3 U/ r 3 =Hessian of dynamics) Singularity free Low memory footprint Easy to implement credit: GRACE subroutine get_sphericalharmonics() do i=1:n SH(i)=f[SH(i-1)] end do Recursive: term i depends on term i-1 8

Fast Geopotential Computations: Two Solution Methods Point MasCon (PMC) model Local Weighted Interpolation FETCH model r j r cm M j Model Earth potential (minus J2) as mass concentrations Simple

9 Fast Geopotential Computations: Two Solution Methods Point MasCon (PMC) model Local Weighted Interpolation FETCH model r j r cm M j Model Earth potential (minus J2) as mass concentrations Simple 2Body acceleration calculations Compute in parallel with Graphics Processing Units, GPUs Strategy: Solve inverse gravity problem Reduce to linear least squares Orthogonal solution method Optimize location/number mascons Russell, R.P., Arora, N. Global Point Mascon Models for Simple, Accurate, and Parallel Geopotential Computation, Paper AAS , AAS/AIAA Space Flight Mechanics Meeting, New Orleans, LA, Feb 2011 Precompute potential (minus J2) in a 3D mesh around the Earth Weighted interpolation between nodes Trade memory for speed J. Junkins implementation worked well 30 years ago Strategy modernize, improve Junkins method adaptive error control Local interpolants: each cell has optimized interpolation polynomial Arora, N., Russell, R.P., Fast, Efficient and Adaptive Interpolation of the Geopotential, Paper AAS , AAS/AIAA Astrodynamics Specialist Conference, Girdwood, AK, Aug Russell R.P.

10 Results (Interoplation Model) Example 200x200 SH field Domain is valid from surface to Moon Speedups upto ~300x compared to 200x200 spherical harmonics Requires ~ 1.8 GB memory Breakeven point is ~7x7 field Continuous to 3 orders (derivs easy to compute) Outperforms new Cubed-Sphere model (Colorado, Beylkin) Exact: our model computes accel as direct gradient of our interpolating function (we do not fit acceleration) Faster: ~4 fold we think, hard to tell but we can calibrate with their break even point is 20x20 ~same memory, ~same accuracy Continuous across all boundaries Eliminate non-spherical gravity from speed bottleneck in: Optimization, targeting, estimations, etc. 10

environment Programming Architecture Computational grid division inter blocks and subblocks Each sub block contains

11 GPU Computing GPUs are multi-threaded computational engines They can execute hundreds of thousands of threads simultaneously CUDA (compute unified device architecture) is a GPU based parallel programming model and software environment Programming Architecture Computational grid division inter blocks and subblocks Each sub block contains certain number of threads Inter-thread communication allowed within a sub-block Inter-block communication not allowed ===> PERSONAL SUPERCOMPUTER 11

12 GPU based RK integrator Runge-Kutta integrator (preliminary study) Explicit fixed step RK45 Step size determined by the highest eccentricity case evaluated first on the CPU (future work includes variable step on GPU) Capable of using any force model implemented in C Single precision version up to 600x speedups and double precision 150x to 200x (speedups compared to similar algorithm on CPU for thousands of objects in parallel). Parallelization Structure Each thread responsible for integrating its own trajectory Leads to an embarrassingly parallel implementation very little inter thread communication across the GPU threads. Shared memory used to store ephemeris data (computed once on the CPU) Constant memory arrays used for storing global grid data (needed for the FETCH model) Gravity model coefficients stored in global memory Positions for all bodies after each time step are stored and sent back to CPU (i.e. for use later to solve the conjunction or other similar problems) Provides multiplicative speedups when combined with FAST perturbation force models 12

13 Overall Algorithmic Details CPU GPU ~ 20K Objects to be integrated Transfer one time common data Multiple Threads : One thread per body Initialize FETCH, FIRE and RK-GPU Transfer one time solution data to CPU Common GPU based FETCH model + Ephemeris Perturbation Model + Simple Drag model Call the GPU-RK Again call to GPU for second batch runs RK45 fixed step integration Solve problems : Conjunction analysis etc. 13

14 Current Tool Configuration High fidelity force model: Ephemeris based other body (sun & moon) perturbation model, implemented via FIRE. 2body + higher order gravity field acceleration obtained via FETCH gravity model (implemented on the GPU) 156x156 resolution (~200x speedups,1.6 GB memory). Drag force implemented via simple exponential based model. The integration step size is determined on the CPU and passed copied over to the GPU GPU execution configuration: Fixed number of threads per block: currently set to 64. Number of blocks dynamically determined at runtime. ~ 3KB of shared memory used in double precision. Fortran 2 cuda wrapper file developed for fast data transfer from Fortran to CUDA. The CPU implementation for comparison purposes uses a highly tuned non singular SH based gravity model implemented through a variable step RK45 integrator set to unitless tol. of 1E-12 14

Performance Evaluation Cases Case 1: Cluster of Objects case: Objects clustered in a normal 3D distribution Average Orbital Elements for reference a = 6700 km, ecc = 0.

15 Performance Evaluation Cases Case 1: Cluster of Objects case: Objects clustered in a normal 3D distribution Average Orbital Elements for reference a = 6700 km, ecc = 0.20, inclination = 35 deg, true anomaly = 0.0 Case 2: Random distribution of Objects case: Objects are closely uniformly distributed: - perigee varies from 6478 to km - eccentricity varies between 0.01 and 0.9 -other elements span full range

16 Test Configuration CPU: Intel Xeon 2.27GHz 8 GB of memory Compiled with Intel Fortran Compiler 11.0 with O2 optimizations GPU: Tesla C2050 : Fermi Architecture based GPU 448 CUDA cores + 3 GB on onboard memory Compiled with NVCC compiler 4.0

17 17 Case 1 Example Run

18 Absolute Performance Case 1 TOF = (10 min to 2 days) 10,000 objects simulated for ~ 2 days takes ~30 seconds

19 Speedup: Case 1 20,000 10,000 We achieve in excess of four orders of magnitude in speedup The high performance is an example for L1 cache utilization of the algorithm

20 Absolute Performance Case 2 TOF = (10 min to 2 hrs)

21 Speedup: Case 2 Speedup ~half as the random distribution case due to L1 cache (memory access ) still achieve 5000x over a tuned CPU implementation In essence this case represents a lower bound on the performance of our tool.

22 Conclusions Preliminary Study/Efforts Designed and implemented a high fidelity spacecraft trajectory integration tool Fixed step Runge-Kutta integrator along with high fidelity FETCH model has been integrated and implemented on the GPU. 3 to 4 orders of magnitude in speedups are reported The biggest limitation of the tool currently is to have upper bound on either the number of bodies of the number of integration steps The tool has immediate potential for a variety of space surveillance applications including: the conjunction problem, covariance realism, particle filters, and general Monte Carlo analyses. 22

23 Future Works Shift to a variable step integrator (must implement the FIRE ephemeris on the GPU) Fast density model Apply to actual catalogue TLEs Propagate covariances as well as states Use results for conjunction analyses Offline on CPU On GPU directly include algorithms (Chan s for example) Other applications: covariance realism, particle filters, and general Monte Carlo analyses 23

24 24 Russell R.P.

25 25 Russell R.P.

26 26 Russell R.P.

27 27 Russell R.P.

28 Defining Speedup All perturbing functions comparison basis are same for the CPU and the GPU code except the gravity field accelerations. Which on the CPU are calculated by a non-singular SH based algorithm and on the GPU it is calculated by the FETCH gravity model. The CPU time only consists of the time taken by representative set of trajectory propagations which are then extrapolated for the given number of objects. For timing the GPU calls the memory transfers calls are not required to be timed as they are typically three orders of magnitude less than the absolute running times, especially for cases with large number of integration steps. This has been verified by timing representative GPU memory transfer calls. The single trajectory integration time to get the fixed GPU step size is not included in the the absolute GPU-RK running times.

29 Truth Spherical Harmonics Model GRACE GGM02C field published and available on line to degree and order 200x x140 comes from GRACE data, higher order terms come from EGM96 Gives us moving target for residuals depending on degree SH field: ~8 digits for 150x150 field ~10 digits for 10x10 field Target for RMS(ε) of new models 1 order of magnitude smaller Accumulated errors by degree (from covariance of GGM02C solution) 29

30 Performance of a high fidelity solution fitting a 156x156 truncation of GRACE field using mascons Surface Potential 30

Local Interpolation Model Discretization Regular

ensure continuity across shells Allow for

31 Local Interpolation Model Discretization Regular grid in lat/lon Adaptive shell thickness in radial direction Each local shell has 3D interpolating function Use weighting functions to ensure continuity across shells Allow for different interpolating functions in neighboring cells 31 Russell R.P.

32 Weight Functions each local cell (four squares) is centered at the node of the grid has its own polynomial interpolant U A (x,y) Any given square is overlapped by four cells (A,B,C,D) Compute U in the overlap region using U A, U B, U C, U D and weighting functions: w A, w B, w C, w D y x A 2D example Continuity (to any order) across boundaries preserved local interpolation functions decoupled A B fit each cell independently x y C D 32

33 How to choose interpolating functions Depart from the Junkins method (to avoid 3D quadratures) Use analytic solutions to large least squares problems using algebraic manipulator MAPLE. Consider an fifth order polynomial in each direction Leading to a total of 5x5x5=125 coefficients Evaluate the truth model model at say 10 3 equally spaced locations Leads to a simple least squares problem ( T ) H WH x = T H Wy Use MAPLE to get analytic inversion: (H T WH ) -1 to so we can solve for coefficients with simple matrix multiply Get analytic inverses for ~400 different interpolating functions Then we can optimize coefficient generation at each cell by checking all options 33

34 Adaptive Error Choose target residual error using altitude and SH error profile For each cell evaluate ~400 interpolating functions choose the one that: meets your error goal has lowest memory footprint For each cell evaluate ~400 interpolating 34 Russell R.P.

Fast, High-Fidelity, Multi-Spacecraft Trajectory Simulation for Space Catalogue Applications

Fast, High-Fidelity, Multi-Spacecraft Trajectory Simulation for Space Catalogue Applications Nitin Arora and Ryan P. Russell Fast methods for high fidelity spacecraft trajectory propagation are becoming