"Fast, High-Fidelity, Multi-Spacecraft Trajectory Simulation for Space Catalogue Applications"
|
|
- Avice McDaniel
- 6 years ago
- Views:
Transcription
1 "Fast, High-Fidelity, Multi-Spacecraft Trajectory Simulation for Space Catalogue Applications" Ryan P. Russell Assistant Professor Nitin Arora Ph.D. Candidate School of Aerospace Engineering Georgia Institute of Technology US-China Space Surveillance Technical Interchange, Beijing, China, Oct 2011
2 Motivations Space Debris: 2009 Iridium/Cosmos collision Currently track ~15K objects Next Generation sensors will track ~100K objects Need faster/better state and uncertainty prediction Covariance Realism No. Objects Tracked Russell R.P.
3 Motivation As near space environment is getting more crowded, the task of accurately tracking and cataloging growing number of objects is becoming more demanding and requiring high fidelity spacecraft trajectory simulations. High fidelity trajectory computation is slow. The problem is compounded when we are tracking large number of objects in space ( ~ 20K). Classic tradeoff between speed and accuracy Fast semi-analytic techniques (SGP4) high fidelity Special Perturbations (SP) In order to sufficiently achieve both in the context of real-time tracking on the order of 100K or more resident space objects, a paradigm shift is necessary to make the problem tractable. 3
4 Aim To bring together the innovations in: fast force model computation AND Single computer parallel programming (i.e. GPU) to achieve BOTH Speed accuracy Multiple orders of magnitude in speedup is sought Maintain approximate accuracy of that in SP Fast ephemeris Parallel Integration Future models Fast gravity model 4
5 Approach We propose a high fidelity spacecraft integration tool that takes advantage of A new (CPU based) fast and accurate perturbation models for high fidelity gravity accelerations and ephemeris Graphics Processing Unit (GPU) based runge-kutta solver to exploit the massive parallelism across multiple spacecraft the expected speedups are multiplicative (100x100=10000). The advantage of GPU based parallelism lies in its single user capability without the need of expensive computer clusters or semi analytic models (loss of accuracy). 5
6 Fast Force Models Fast and accurate Luni-solar ephemeris and Earth orientation: FIRE Reduction in computational time (multiple orders of magnitude improvement!). Provides continuous and analytic first and second derivatives of states and orientation matrices. User friendly (requires no more expertise than using JPL s SPICE). High flexibility and portability. Fast, efficient and Geopotential Computation: FETCH 3D interpolation based global gravity model, trades memory for speed. Non-Singular and Continuous to any order Accurate to the error in SH series. Scalable to any order /degree Extremely user friendly 6
7 FIRE for Fast Luni-Solar and Earth Orientation Ephemerides Russell, R. P., Arora, N., Arora, N., Russell, R. P., A Fast, Accurate, and Smooth Planetary Ephemeris Retrieval System, Celestial Mechanics and Dynamical Astronomy, Vol. 108, No. 2, 2010, pp , DOI /s NEED: position, velocity of Moon and Sun; and Earth Orientation Motivation: JPL SPICE Ephemeris is ssssllloooowww.. (software can spend more than half its time getting ephemeris data) 2 orders of magnitude reduction in call time for ephemeris calls for body and orientation calls to SPICE Custom built for problems that favor higher speeds and smooth derivatives (continuous and analytic first and second derivatives) fold improvement for trajectory propagation speeds (good for monte-carlos, etc.) CURRENT Other load Ephemeris load PREFERRED Ephemeris load Other load 7 Ryan P. Russell
8 Fast Geopotential Computations High fidelity geopotentials are expensive to compute spherical harmonics (SH) is conventional approach a 200x200 SH field is ~40,000 terms has spatial resolution ~ (100 km x 100 km) Recursive formulation Fast for single processor Not amenable to parallel computing SLOW Bottleneck in many applications: orbit estimation trajectory optimization mission design PRACTICE: Fields are truncated Engineers live with errors (blissfully unaware in most cases) Want: Fast Continuous and smooth across global domain Derivatives are continuous to at least 3 orders across global domain ( 3 U/ r 3 =Hessian of dynamics) Singularity free Low memory footprint Easy to implement credit: GRACE subroutine get_sphericalharmonics() do i=1:n SH(i)=f[SH(i-1)] end do Recursive: term i depends on term i-1 8
9 Fast Geopotential Computations: Two Solution Methods Point MasCon (PMC) model Local Weighted Interpolation FETCH model r j r cm M j Model Earth potential (minus J2) as mass concentrations Simple 2Body acceleration calculations Compute in parallel with Graphics Processing Units, GPUs Strategy: Solve inverse gravity problem Reduce to linear least squares Orthogonal solution method Optimize location/number mascons Russell, R.P., Arora, N. Global Point Mascon Models for Simple, Accurate, and Parallel Geopotential Computation, Paper AAS , AAS/AIAA Space Flight Mechanics Meeting, New Orleans, LA, Feb 2011 Precompute potential (minus J2) in a 3D mesh around the Earth Weighted interpolation between nodes Trade memory for speed J. Junkins implementation worked well 30 years ago Strategy modernize, improve Junkins method adaptive error control Local interpolants: each cell has optimized interpolation polynomial Arora, N., Russell, R.P., Fast, Efficient and Adaptive Interpolation of the Geopotential, Paper AAS , AAS/AIAA Astrodynamics Specialist Conference, Girdwood, AK, Aug Russell R.P.
10 Results (Interoplation Model) Example 200x200 SH field Domain is valid from surface to Moon Speedups upto ~300x compared to 200x200 spherical harmonics Requires ~ 1.8 GB memory Breakeven point is ~7x7 field Continuous to 3 orders (derivs easy to compute) Outperforms new Cubed-Sphere model (Colorado, Beylkin) Exact: our model computes accel as direct gradient of our interpolating function (we do not fit acceleration) Faster: ~4 fold we think, hard to tell but we can calibrate with their break even point is 20x20 ~same memory, ~same accuracy Continuous across all boundaries Eliminate non-spherical gravity from speed bottleneck in: Optimization, targeting, estimations, etc. 10
11 GPU Computing GPUs are multi-threaded computational engines They can execute hundreds of thousands of threads simultaneously CUDA (compute unified device architecture) is a GPU based parallel programming model and software environment Programming Architecture Computational grid division inter blocks and subblocks Each sub block contains certain number of threads Inter-thread communication allowed within a sub-block Inter-block communication not allowed ===> PERSONAL SUPERCOMPUTER 11
12 GPU based RK integrator Runge-Kutta integrator (preliminary study) Explicit fixed step RK45 Step size determined by the highest eccentricity case evaluated first on the CPU (future work includes variable step on GPU) Capable of using any force model implemented in C Single precision version up to 600x speedups and double precision 150x to 200x (speedups compared to similar algorithm on CPU for thousands of objects in parallel). Parallelization Structure Each thread responsible for integrating its own trajectory Leads to an embarrassingly parallel implementation very little inter thread communication across the GPU threads. Shared memory used to store ephemeris data (computed once on the CPU) Constant memory arrays used for storing global grid data (needed for the FETCH model) Gravity model coefficients stored in global memory Positions for all bodies after each time step are stored and sent back to CPU (i.e. for use later to solve the conjunction or other similar problems) Provides multiplicative speedups when combined with FAST perturbation force models 12
13 Overall Algorithmic Details CPU GPU ~ 20K Objects to be integrated Transfer one time common data Multiple Threads : One thread per body Initialize FETCH, FIRE and RK-GPU Transfer one time solution data to CPU Common GPU based FETCH model + Ephemeris Perturbation Model + Simple Drag model Call the GPU-RK Again call to GPU for second batch runs RK45 fixed step integration Solve problems : Conjunction analysis etc. 13
14 Current Tool Configuration High fidelity force model: Ephemeris based other body (sun & moon) perturbation model, implemented via FIRE. 2body + higher order gravity field acceleration obtained via FETCH gravity model (implemented on the GPU) 156x156 resolution (~200x speedups,1.6 GB memory). Drag force implemented via simple exponential based model. The integration step size is determined on the CPU and passed copied over to the GPU GPU execution configuration: Fixed number of threads per block: currently set to 64. Number of blocks dynamically determined at runtime. ~ 3KB of shared memory used in double precision. Fortran 2 cuda wrapper file developed for fast data transfer from Fortran to CUDA. The CPU implementation for comparison purposes uses a highly tuned non singular SH based gravity model implemented through a variable step RK45 integrator set to unitless tol. of 1E-12 14
15 Performance Evaluation Cases Case 1: Cluster of Objects case: Objects clustered in a normal 3D distribution Average Orbital Elements for reference a = 6700 km, ecc = 0.20, inclination = 35 deg, true anomaly = 0.0 Case 2: Random distribution of Objects case: Objects are closely uniformly distributed: - perigee varies from 6478 to km - eccentricity varies between 0.01 and 0.9 -other elements span full range
16 Test Configuration CPU: Intel Xeon 2.27GHz 8 GB of memory Compiled with Intel Fortran Compiler 11.0 with O2 optimizations GPU: Tesla C2050 : Fermi Architecture based GPU 448 CUDA cores + 3 GB on onboard memory Compiled with NVCC compiler 4.0
17 17 Case 1 Example Run
18 Absolute Performance Case 1 TOF = (10 min to 2 days) 10,000 objects simulated for ~ 2 days takes ~30 seconds
19 Speedup: Case 1 20,000 10,000 We achieve in excess of four orders of magnitude in speedup The high performance is an example for L1 cache utilization of the algorithm
20 Absolute Performance Case 2 TOF = (10 min to 2 hrs)
21 Speedup: Case 2 Speedup ~half as the random distribution case due to L1 cache (memory access ) still achieve 5000x over a tuned CPU implementation In essence this case represents a lower bound on the performance of our tool.
22 Conclusions Preliminary Study/Efforts Designed and implemented a high fidelity spacecraft trajectory integration tool Fixed step Runge-Kutta integrator along with high fidelity FETCH model has been integrated and implemented on the GPU. 3 to 4 orders of magnitude in speedups are reported The biggest limitation of the tool currently is to have upper bound on either the number of bodies of the number of integration steps The tool has immediate potential for a variety of space surveillance applications including: the conjunction problem, covariance realism, particle filters, and general Monte Carlo analyses. 22
23 Future Works Shift to a variable step integrator (must implement the FIRE ephemeris on the GPU) Fast density model Apply to actual catalogue TLEs Propagate covariances as well as states Use results for conjunction analyses Offline on CPU On GPU directly include algorithms (Chan s for example) Other applications: covariance realism, particle filters, and general Monte Carlo analyses 23
24 24 Russell R.P.
25 25 Russell R.P.
26 26 Russell R.P.
27 27 Russell R.P.
28 Defining Speedup All perturbing functions comparison basis are same for the CPU and the GPU code except the gravity field accelerations. Which on the CPU are calculated by a non-singular SH based algorithm and on the GPU it is calculated by the FETCH gravity model. The CPU time only consists of the time taken by representative set of trajectory propagations which are then extrapolated for the given number of objects. For timing the GPU calls the memory transfers calls are not required to be timed as they are typically three orders of magnitude less than the absolute running times, especially for cases with large number of integration steps. This has been verified by timing representative GPU memory transfer calls. The single trajectory integration time to get the fixed GPU step size is not included in the the absolute GPU-RK running times.
29 Truth Spherical Harmonics Model GRACE GGM02C field published and available on line to degree and order 200x x140 comes from GRACE data, higher order terms come from EGM96 Gives us moving target for residuals depending on degree SH field: ~8 digits for 150x150 field ~10 digits for 10x10 field Target for RMS(ε) of new models 1 order of magnitude smaller Accumulated errors by degree (from covariance of GGM02C solution) 29
30 Performance of a high fidelity solution fitting a 156x156 truncation of GRACE field using mascons Surface Potential 30
31 Local Interpolation Model Discretization Regular grid in lat/lon Adaptive shell thickness in radial direction Each local shell has 3D interpolating function Use weighting functions to ensure continuity across shells Allow for different interpolating functions in neighboring cells 31 Russell R.P.
32 Weight Functions each local cell (four squares) is centered at the node of the grid has its own polynomial interpolant U A (x,y) Any given square is overlapped by four cells (A,B,C,D) Compute U in the overlap region using U A, U B, U C, U D and weighting functions: w A, w B, w C, w D y x A 2D example Continuity (to any order) across boundaries preserved local interpolation functions decoupled A B fit each cell independently x y C D 32
33 How to choose interpolating functions Depart from the Junkins method (to avoid 3D quadratures) Use analytic solutions to large least squares problems using algebraic manipulator MAPLE. Consider an fifth order polynomial in each direction Leading to a total of 5x5x5=125 coefficients Evaluate the truth model model at say 10 3 equally spaced locations Leads to a simple least squares problem ( T ) H WH x = T H Wy Use MAPLE to get analytic inversion: (H T WH ) -1 to so we can solve for coefficients with simple matrix multiply Get analytic inverses for ~400 different interpolating functions Then we can optimize coefficient generation at each cell by checking all options 33
34 Adaptive Error Choose target residual error using altitude and SH error profile For each cell evaluate ~400 interpolating functions choose the one that: meets your error goal has lowest memory footprint For each cell evaluate ~400 interpolating 34 Russell R.P.
Fast, High-Fidelity, Multi-Spacecraft Trajectory Simulation for Space Catalogue Applications
Fast, High-Fidelity, Multi-Spacecraft Trajectory Simulation for Space Catalogue Applications Nitin Arora and Ryan P. Russell Fast methods for high fidelity spacecraft trajectory propagation are becoming
More informationEfficient Conjunction Assessment using Modified Chebyshev Picard Iteration
Efficient Conjunction Assessment using Modified Chebyshev Picard Iteration Austin B. Probe, Brent Macomber, Julie Read, Robyn Woollands, Abhay Masher, and John L. Junkins Texas A&M University ABSTRACT
More informationSCALABLE TRAJECTORY DESIGN WITH COTS SOFTWARE. x8534, x8505,
SCALABLE TRAJECTORY DESIGN WITH COTS SOFTWARE Kenneth Kawahara (1) and Jonathan Lowe (2) (1) Analytical Graphics, Inc., 6404 Ivy Lane, Suite 810, Greenbelt, MD 20770, (240) 764 1500 x8534, kkawahara@agi.com
More informationTHE PROCESS OF PARALLELIZING THE CONJUNCTION PREDICTION ALGORITHM OF ESAS SSA CONJUNCTION PREDICTION SERVICE USING GPGPU
THE PROCESS OF PARALLELIZING THE CONJUNCTION PREDICTION ALGORITHM OF ESAS SSA CONJUNCTION PREDICTION SERVICE USING GPGPU M. Fehr, V. Navarro, L. Martin, and E. Fletcher European Space Astronomy Center,
More informationPhD Student. Associate Professor, Co-Director, Center for Computational Earth and Environmental Science. Abdulrahman Manea.
Abdulrahman Manea PhD Student Hamdi Tchelepi Associate Professor, Co-Director, Center for Computational Earth and Environmental Science Energy Resources Engineering Department School of Earth Sciences
More informationTwo-Phase flows on massively parallel multi-gpu clusters
Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous
More informationALTHOUGH a sphere is an ubiquitous object, constructing a
JOURNAL OF GUIDANCE,CONTROL, AND DYNAMICS Vol. 33, No. 2, March April 2010 Comparisons of the Cubed-Sphere Gravity Model with the Spherical Harmonics Brandon A. Jones, George H. Born, and Gregory Beylkin
More informationVery fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards
Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards By Allan P. Engsig-Karup, Morten Gorm Madsen and Stefan L. Glimberg DTU Informatics Workshop
More informationACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS
ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS Ferdinando Alessi Annalisa Massini Roberto Basili INGV Introduction The simulation of wave propagation
More informationBallistic Coefficient Prediction for Resident Space Objects
Ballistic Coefficient Prediction for Resident Space Objects Dr. Ryan Russell, Nitin Arora, Vivek Vittaldev University of Texas at Austin Dr. David Gaylor, Jessica Anderson Emergent Space Technologies,
More informationGetting Started Processing DSN Data with ODTK
Getting Started Processing DSN Data with ODTK 1 Introduction ODTK can process several types of tracking data produced by JPL s deep space network (DSN): two-way sequential range, two- and three-way Doppler,
More informationUsing GPUs to compute the multilevel summation of electrostatic forces
Using GPUs to compute the multilevel summation of electrostatic forces David J. Hardy Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of
More informationDirected Optimization On Stencil-based Computational Fluid Dynamics Application(s)
Directed Optimization On Stencil-based Computational Fluid Dynamics Application(s) Islam Harb 08/21/2015 Agenda Motivation Research Challenges Contributions & Approach Results Conclusion Future Work 2
More informationCollision Risk Computation accounting for Complex geometries of involved objects
Collision Risk Computation accounting for Complex geometries of involved objects Noelia Sánchez-Ortiz (1), Ignacio Grande-Olalla (1), Klaus Merz (2) (1) Deimos Space, Ronda de Poniente 19, 28760, Tres
More informationCORAM: ESA S COLLISION RISK ASSESSMENT AND AVOIDANCE MANOEUVRES COMPUTATION TOOL *
IAA-AAS-DyCoSS2-14-5-3 CORAM: ESA S COLLISION RISK ASSESSMENT AND AVOIDANCE MANOEUVRES COMPUTATION TOOL * Juan Antonio Pulido Cobo, Noelia Sánchez Ortiz, Ignacio Grande Olalla, Klaus Merz, INTRODUCTION
More informationOn Massively Parallel Algorithms to Track One Path of a Polynomial Homotopy
On Massively Parallel Algorithms to Track One Path of a Polynomial Homotopy Jan Verschelde joint with Genady Yoffe and Xiangcheng Yu University of Illinois at Chicago Department of Mathematics, Statistics,
More informationMars Pinpoint Landing Trajectory Optimization Using Sequential Multiresolution Technique
Mars Pinpoint Landing Trajectory Optimization Using Sequential Multiresolution Technique * Jisong Zhao 1), Shuang Li 2) and Zhigang Wu 3) 1), 2) College of Astronautics, NUAA, Nanjing 210016, PRC 3) School
More informationIntermediate Parallel Programming & Cluster Computing
High Performance Computing Modernization Program (HPCMP) Summer 2011 Puerto Rico Workshop on Intermediate Parallel Programming & Cluster Computing in conjunction with the National Computational Science
More informationGenerators at the LHC
High Performance Computing for Event Generators at the LHC A Multi-Threaded Version of MCFM, J.M. Campbell, R.K. Ellis, W. Giele, 2015. Higgs boson production in association with a jet at NNLO using jettiness
More informationLUNAR TEMPERATURE CALCULATIONS ON A GPU
LUNAR TEMPERATURE CALCULATIONS ON A GPU Kyle M. Berney Department of Information & Computer Sciences Department of Mathematics University of Hawai i at Mānoa Honolulu, HI 96822 ABSTRACT Lunar surface temperature
More informationAccelerating CFD with Graphics Hardware
Accelerating CFD with Graphics Hardware Graham Pullan (Whittle Laboratory, Cambridge University) 16 March 2009 Today Motivation CPUs and GPUs Programming NVIDIA GPUs with CUDA Application to turbomachinery
More informationGEOPHYS 242: Near Surface Geophysical Imaging. Class 8: Joint Geophysical Inversions Wed, April 20, 2011
GEOPHYS 4: Near Surface Geophysical Imaging Class 8: Joint Geophysical Inversions Wed, April, 11 Invert multiple types of data residuals simultaneously Apply soft mutual constraints: empirical, physical,
More informationAutomatic Scaling Iterative Computations. Aug. 7 th, 2012
Automatic Scaling Iterative Computations Guozhang Wang Cornell University Aug. 7 th, 2012 1 What are Non-Iterative Computations? Non-iterative computation flow Directed Acyclic Examples Batch style analytics
More informationMissile External Aerodynamics Using Star-CCM+ Star European Conference 03/22-23/2011
Missile External Aerodynamics Using Star-CCM+ Star European Conference 03/22-23/2011 StarCCM_StarEurope_2011 4/6/11 1 Overview 2 Role of CFD in Aerodynamic Analyses Classical aerodynamics / Semi-Empirical
More informationParallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs
Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs C.-C. Su a, C.-W. Hsieh b, M. R. Smith b, M. C. Jermy c and J.-S. Wu a a Department of Mechanical Engineering, National Chiao Tung
More informationEfficient Computation of Radial Distribution Function on GPUs
Efficient Computation of Radial Distribution Function on GPUs Yi-Cheng Tu * and Anand Kumar Department of Computer Science and Engineering University of South Florida, Tampa, Florida 2 Overview Introduction
More informationAnalysis and Visualization Algorithms in VMD
1 Analysis and Visualization Algorithms in VMD David Hardy Research/~dhardy/ NAIS: State-of-the-Art Algorithms for Molecular Dynamics (Presenting the work of John Stone.) VMD Visual Molecular Dynamics
More informationGPU Acceleration of the Longwave Rapid Radiative Transfer Model in WRF using CUDA Fortran. G. Ruetsch, M. Fatica, E. Phillips, N.
GPU Acceleration of the Longwave Rapid Radiative Transfer Model in WRF using CUDA Fortran G. Ruetsch, M. Fatica, E. Phillips, N. Juffa Outline WRF and RRTM Previous Work CUDA Fortran Features RRTM in CUDA
More informationESA S COLLISION RISK ASSESSMENT AND AVOIDANCE MANOEUVRES TOOL (CORAM)
ESA S COLLISION RISK ASSESSMENT AND AVOIDANCE MANOEUVRES TOOL (CORAM) Juan Antonio Pulido (1), Noelia Sánchez (2), Ignacio Grande (3) and Klaus Merz (4) (1)(2)(3) Elecnor Deimos Space, Ronda de Poniente,
More informationMulti-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation
Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation 1 Cheng-Han Du* I-Hsin Chung** Weichung Wang* * I n s t i t u t e o f A p p l i e d M
More informationTechnology for a better society. hetcomp.com
Technology for a better society hetcomp.com 1 J. Seland, C. Dyken, T. R. Hagen, A. R. Brodtkorb, J. Hjelmervik,E Bjønnes GPU Computing USIT Course Week 16th November 2011 hetcomp.com 2 9:30 10:15 Introduction
More informationN-Body Simulation using CUDA. CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo
N-Body Simulation using CUDA CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo Project plan Develop a program to simulate gravitational
More informationAdaptive Mesh Astrophysical Fluid Simulations on GPU. San Jose 10/2/2009 Peng Wang, NVIDIA
Adaptive Mesh Astrophysical Fluid Simulations on GPU San Jose 10/2/2009 Peng Wang, NVIDIA Overview Astrophysical motivation & the Enzo code Finite volume method and adaptive mesh refinement (AMR) CUDA
More informationGPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC
GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC MIKE GOWANLOCK NORTHERN ARIZONA UNIVERSITY SCHOOL OF INFORMATICS, COMPUTING & CYBER SYSTEMS BEN KARSIN UNIVERSITY OF HAWAII AT MANOA DEPARTMENT
More informationUniversity of Texas Center for Space Research. ICESAT/GLAS CSR SCF Release Notes for Orbit and Attitude Determination
University of Texas Center for Space Research ICESAT/GLAS CSR SCF Notes for Orbit and Attitude Determination Charles Webb Tim Urban Bob Schutz Version 1.0 August 2006 CSR SCF Notes for Orbit and Attitude
More informationOn Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators
On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC
More informationInternational Supercomputing Conference 2009
International Supercomputing Conference 2009 Implementation of a Lattice-Boltzmann-Method for Numerical Fluid Mechanics Using the nvidia CUDA Technology E. Riegel, T. Indinger, N.A. Adams Technische Universität
More informationCenter for Computational Science
Center for Computational Science Toward GPU-accelerated meshfree fluids simulation using the fast multipole method Lorena A Barba Boston University Department of Mechanical Engineering with: Felipe Cruz,
More informationThe Spherical Harmonics Discrete Ordinate Method for Atmospheric Radiative Transfer
The Spherical Harmonics Discrete Ordinate Method for Atmospheric Radiative Transfer K. Franklin Evans Program in Atmospheric and Oceanic Sciences University of Colorado, Boulder Computational Methods in
More informationUniversity of Texas Center for Space Research ICESAT/GLAS Document: CSR SCF Release Notes for Orbit and Attitude Determination
University of Texas Center for Space Research ICESAT/GLAS Document: CSR SCF Notes for Orbit and Attitude Determination Tim Urban Sungkoo Bae Hyung-Jin Rim Charles Webb Sungpil Yoon Bob Schutz Version 3.0
More informationT6: Position-Based Simulation Methods in Computer Graphics. Jan Bender Miles Macklin Matthias Müller
T6: Position-Based Simulation Methods in Computer Graphics Jan Bender Miles Macklin Matthias Müller Jan Bender Organizer Professor at the Visual Computing Institute at Aachen University Research topics
More informationParallel Summation of Inter-Particle Forces in SPH
Parallel Summation of Inter-Particle Forces in SPH Fifth International Workshop on Meshfree Methods for Partial Differential Equations 17.-19. August 2009 Bonn Overview Smoothed particle hydrodynamics
More informationPART I - Fundamentals of Parallel Computing
PART I - Fundamentals of Parallel Computing Objectives What is scientific computing? The need for more computing power The need for parallel computing and parallel programs 1 What is scientific computing?
More informationAngles-Only Autonomous Rendezvous Navigation to a Space Resident Object
aa.stanford.edu damicos@stanford.edu stanford.edu Angles-Only Autonomous Rendezvous Navigation to a Space Resident Object Josh Sullivan PhD. Candidate, Space Rendezvous Laboratory PI: Dr. Simone D Amico
More informationJ. Blair Perot. Ali Khajeh-Saeed. Software Engineer CD-adapco. Mechanical Engineering UMASS, Amherst
Ali Khajeh-Saeed Software Engineer CD-adapco J. Blair Perot Mechanical Engineering UMASS, Amherst Supercomputers Optimization Stream Benchmark Stag++ (3D Incompressible Flow Code) Matrix Multiply Function
More informationPython for Development of OpenMP and CUDA Kernels for Multidimensional Data
Python for Development of OpenMP and CUDA Kernels for Multidimensional Data Zane W. Bell 1, Greg G. Davidson 2, Ed D Azevedo 3, Thomas M. Evans 2, Wayne Joubert 4, John K. Munro, Jr. 5, Dilip R. Patlolla
More informationGPU Implementation of Implicit Runge-Kutta Methods
GPU Implementation of Implicit Runge-Kutta Methods Navchetan Awasthi, Abhijith J Supercomputer Education and Research Centre Indian Institute of Science, Bangalore, India navchetanawasthi@gmail.com, abhijith31792@gmail.com
More information(x, y, z) m 2. (x, y, z) ...] T. m 2. m = [m 1. m 3. Φ = r T V 1 r + λ 1. m T Wm. m T L T Lm + λ 2. m T Hm + λ 3. t(x, y, z) = m 1
Class 1: Joint Geophysical Inversions Wed, December 1, 29 Invert multiple types of data residuals simultaneously Apply soft mutual constraints: empirical, physical, statistical Deal with data in the same
More informationA CUBED SPHERE GRAVITY MODEL FOR FAST ORBIT PROPAGATION
(Preprint) AAS 9-137 A CUBED SPHERE GRAVITY MODEL FOR FAST ORBIT PROPAGATIO Brandon A. Jones, George H. Born, and Gregory Beylkin The cubed sphere model of the gravity field maps the primary body to the
More informationQuantitative study of computing time of direct/iterative solver for MoM by GPU computing
Quantitative study of computing time of direct/iterative solver for MoM by GPU computing Keisuke Konno 1a), Hajime Katsuda 2, Kei Yokokawa 1, Qiang Chen 1, Kunio Sawaya 3, and Qiaowei Yuan 4 1 Department
More informationFaster Simulations of the National Airspace System
Faster Simulations of the National Airspace System PK Menon Monish Tandale Sandy Wiraatmadja Optimal Synthesis Inc. Joseph Rios NASA Ames Research Center NVIDIA GPU Technology Conference 2010, San Jose,
More informationGeometric Rectification of Remote Sensing Images
Geometric Rectification of Remote Sensing Images Airborne TerrestriaL Applications Sensor (ATLAS) Nine flight paths were recorded over the city of Providence. 1 True color ATLAS image (bands 4, 2, 1 in
More informationThe jello cube. Undeformed cube. Deformed cube
The Jello Cube Assignment 1, CSCI 520 Jernej Barbic, USC Undeformed cube The jello cube Deformed cube The jello cube is elastic, Can be bent, stretched, squeezed,, Without external forces, it eventually
More informationHigh Performance Computing on GPUs using NVIDIA CUDA
High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and
More informationSuperdiffusion and Lévy Flights. A Particle Transport Monte Carlo Simulation Code
Superdiffusion and Lévy Flights A Particle Transport Monte Carlo Simulation Code Eduardo J. Nunes-Pereira Centro de Física Escola de Ciências Universidade do Minho Page 1 of 49 ANOMALOUS TRANSPORT Definitions
More informationSoftware and Performance Engineering for numerical codes on GPU clusters
Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3
More informationSampling Using GPU Accelerated Sparse Hierarchical Models
Sampling Using GPU Accelerated Sparse Hierarchical Models Miroslav Stoyanov Oak Ridge National Laboratory supported by Exascale Computing Project (ECP) exascaleproject.org April 9, 28 Miroslav Stoyanov
More informationMulti-Mesh CFD. Chris Roy Chip Jackson (1 st year PhD student) Aerospace and Ocean Engineering Department Virginia Tech
Multi-Mesh CFD Chris Roy Chip Jackson (1 st year PhD student) Aerospace and Ocean Engineering Department Virginia Tech cjroy@vt.edu May 21, 2014 CCAS Program Review, Columbus, OH 1 Motivation Automated
More informationShape of Things to Come: Next-Gen Physics Deep Dive
Shape of Things to Come: Next-Gen Physics Deep Dive Jean Pierre Bordes NVIDIA Corporation Free PhysX on CUDA PhysX by NVIDIA since March 2008 PhysX on CUDA available: August 2008 GPU PhysX in Games Physical
More informationHigh performance Computing and O&G Challenges
High performance Computing and O&G Challenges 2 Seismic exploration challenges High Performance Computing and O&G challenges Worldwide Context Seismic,sub-surface imaging Computing Power needs Accelerating
More informationHow to Optimize Geometric Multigrid Methods on GPUs
How to Optimize Geometric Multigrid Methods on GPUs Markus Stürmer, Harald Köstler, Ulrich Rüde System Simulation Group University Erlangen March 31st 2011 at Copper Schedule motivation imaging in gradient
More informationSimulation in Computer Graphics. Particles. Matthias Teschner. Computer Science Department University of Freiburg
Simulation in Computer Graphics Particles Matthias Teschner Computer Science Department University of Freiburg Outline introduction particle motion finite differences system of first order ODEs second
More informationFlux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters
Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,
More informationGPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC
GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of
More informationGPU acceleration of 3D forward and backward projection using separable footprints for X-ray CT image reconstruction
GPU acceleration of 3D forward and backward projection using separable footprints for X-ray CT image reconstruction Meng Wu and Jeffrey A. Fessler EECS Department University of Michigan Fully 3D Image
More informationComprehensive Matlab GUI for Determining Barycentric Orbital Trajectories
Comprehensive Matlab GUI for Determining Barycentric Orbital Trajectories Steve Katzman 1 California Polytechnic State University, San Luis Obispo, CA 93405 When a 3-body gravitational system is modeled
More informationGPU COMPUTING WITH MSC NASTRAN 2013
SESSION TITLE WILL BE COMPLETED BY MSC SOFTWARE GPU COMPUTING WITH MSC NASTRAN 2013 Srinivas Kodiyalam, NVIDIA, Santa Clara, USA THEME Accelerated computing with GPUs SUMMARY Current trends in HPC (High
More informationDevelopment of a Maxwell Equation Solver for Application to Two Fluid Plasma Models. C. Aberle, A. Hakim, and U. Shumlak
Development of a Maxwell Equation Solver for Application to Two Fluid Plasma Models C. Aberle, A. Hakim, and U. Shumlak Aerospace and Astronautics University of Washington, Seattle American Physical Society
More informationNVIDIA s Compute Unified Device Architecture (CUDA)
NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability 1 History of GPU
More informationNVIDIA s Compute Unified Device Architecture (CUDA)
NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability History of GPU
More informationComputation of the gravity gradient tensor due to topographic masses using tesseroids
Computation of the gravity gradient tensor due to topographic masses using tesseroids Leonardo Uieda 1 Naomi Ussami 2 Carla F Braitenberg 3 1. Observatorio Nacional, Rio de Janeiro, Brazil 2. Universidade
More informationLong time integrations of a convective PDE on the sphere by RBF collocation
Long time integrations of a convective PDE on the sphere by RBF collocation Bengt Fornberg and Natasha Flyer University of Colorado NCAR Department of Applied Mathematics Institute for Mathematics Applied
More informationOn the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters
1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk
More informationThe Jello Cube Assignment 1, CSCI 520. Jernej Barbic, USC
The Jello Cube Assignment 1, CSCI 520 Jernej Barbic, USC 1 The jello cube Undeformed cube Deformed cube The jello cube is elastic, Can be bent, stretched, squeezed,, Without external forces, it eventually
More informationBlock Lanczos-Montgomery method over large prime fields with GPU accelerated dense operations
Block Lanczos-Montgomery method over large prime fields with GPU accelerated dense operations Nikolai Zamarashkin and Dmitry Zheltkov INM RAS, Gubkina 8, Moscow, Russia {nikolai.zamarashkin,dmitry.zheltkov}@gmail.com
More informationThe Fast Multipole Method (FMM)
The Fast Multipole Method (FMM) Motivation for FMM Computational Physics Problems involving mutual interactions of N particles Gravitational or Electrostatic forces Collective (but weak) long-range forces
More informationGTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013
GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS Kyle Spagnoli Research Engineer @ EM Photonics 3/20/2013 INTRODUCTION» Sparse systems» Iterative solvers» High level benchmarks»
More informationPerformance Metrics of a Parallel Three Dimensional Two-Phase DSMC Method for Particle-Laden Flows
Performance Metrics of a Parallel Three Dimensional Two-Phase DSMC Method for Particle-Laden Flows Benzi John* and M. Damodaran** Division of Thermal and Fluids Engineering, School of Mechanical and Aerospace
More informationHARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA
HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh
More informationNVIDIA. Interacting with Particle Simulation in Maya using CUDA & Maximus. Wil Braithwaite NVIDIA Applied Engineering Digital Film
NVIDIA Interacting with Particle Simulation in Maya using CUDA & Maximus Wil Braithwaite NVIDIA Applied Engineering Digital Film Some particle milestones FX Rendering Physics 1982 - First CG particle FX
More informationMAGMA. Matrix Algebra on GPU and Multicore Architectures
MAGMA Matrix Algebra on GPU and Multicore Architectures Innovative Computing Laboratory Electrical Engineering and Computer Science University of Tennessee Piotr Luszczek (presenter) web.eecs.utk.edu/~luszczek/conf/
More informationAccelerating the Implicit Integration of Stiff Chemical Systems with Emerging Multi-core Technologies
Accelerating the Implicit Integration of Stiff Chemical Systems with Emerging Multi-core Technologies John C. Linford John Michalakes Manish Vachharajani Adrian Sandu IMAGe TOY 2009 Workshop 2 Virginia
More informationThe Immersed Interface Method
The Immersed Interface Method Numerical Solutions of PDEs Involving Interfaces and Irregular Domains Zhiiin Li Kazufumi Ito North Carolina State University Raleigh, North Carolina Society for Industrial
More informationNVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI
NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI Overview Unparalleled Value Product Portfolio Software Platform From Desk to Data Center to Cloud Summary AI researchers depend on computing performance to gain
More informationSPECIAL TECHNIQUES-II
SPECIAL TECHNIQUES-II Lecture 19: Electromagnetic Theory Professor D. K. Ghosh, Physics Department, I.I.T., Bombay Method of Images for a spherical conductor Example :A dipole near aconducting sphere The
More informationStudy and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou
Study and implementation of computational methods for Differential Equations in heterogeneous systems Asimina Vouronikoy - Eleni Zisiou Outline Introduction Review of related work Cyclic Reduction Algorithm
More informationPorting a parallel rotor wake simulation to GPGPU accelerators using OpenACC
DLR.de Chart 1 Porting a parallel rotor wake simulation to GPGPU accelerators using OpenACC Melven Röhrig-Zöllner DLR, Simulations- und Softwaretechnik DLR.de Chart 2 Outline Hardware-Architecture (CPU+GPU)
More informationBenchmark 1.a Investigate and Understand Designated Lab Techniques The student will investigate and understand designated lab techniques.
I. Course Title Parallel Computing 2 II. Course Description Students study parallel programming and visualization in a variety of contexts with an emphasis on underlying and experimental technologies.
More informationGeneral Purpose GPU Computing in Partial Wave Analysis
JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data
More informationUsing CUDA to Accelerate Radar Image Processing
Using CUDA to Accelerate Radar Image Processing Aaron Rogan Richard Carande 9/23/2010 Approved for Public Release by the Air Force on 14 Sep 2010, Document Number 88 ABW-10-5006 Company Overview Neva Ridge
More informationHigh Performance Orbital Propagation Using a Generic Software Architecture
High Performance Orbital Propagation Using a Generic Software Architecture M. Möckel SERC Limited, Mount Stromlo Observatory, Cotter Road, Weston Creek, ACT 2611, Australia J. Bennett SERC Limited, Mount
More informationCONTINGENCY PLANNING AND MISSION PERFORMANCE IMPACTS: A NOVEL APPROACH TO LAUNCH ERROR SIMULATION, CHARACTERISATION AND CORRECTION
Emmet FLETCHER 1 1 Analytical Graphics, Inc. Plaza de la Encina 5, local 5 Tres Cantos 2876 Madrid Spain efletcher@stk.com CONTINGENCY PLANNING AND MISSION PERFORMANCE IMPACTS: A NOVEL APPROACH TO LAUNCH
More informationFrom Theory to Application (Optimization and Optimal Control in Space Applications)
From Theory to Application (Optimization and Optimal Control in Space Applications) Christof Büskens Optimierung & Optimale Steuerung 02.02.2012 The paradox of mathematics If mathematics refer to reality,
More informationMy 2 hours today: 1. Efficient arithmetic in finite fields minute break 3. Elliptic curves. My 2 hours tomorrow:
My 2 hours today: 1. Efficient arithmetic in finite fields 2. 10-minute break 3. Elliptic curves My 2 hours tomorrow: 4. Efficient arithmetic on elliptic curves 5. 10-minute break 6. Choosing curves Efficient
More informationKrishnan Suresh Associate Professor Mechanical Engineering
Large Scale FEA on the GPU Krishnan Suresh Associate Professor Mechanical Engineering High-Performance Trick Computations (i.e., 3.4*1.22): essentially free Memory access determines speed of code Pick
More informationAccelerating Mean Shift Segmentation Algorithm on Hybrid CPU/GPU Platforms
Accelerating Mean Shift Segmentation Algorithm on Hybrid CPU/GPU Platforms Liang Men, Miaoqing Huang, John Gauch Department of Computer Science and Computer Engineering University of Arkansas {mliang,mqhuang,jgauch}@uark.edu
More informationRT 3D FDTD Simulation of LF and MF Room Acoustics
RT 3D FDTD Simulation of LF and MF Room Acoustics ANDREA EMANUELE GRECO Id. 749612 andreaemanuele.greco@mail.polimi.it ADVANCED COMPUTER ARCHITECTURES (A.A. 2010/11) Prof.Ing. Cristina Silvano Dr.Ing.
More informationA Parallel Access Method for Spatial Data Using GPU
A Parallel Access Method for Spatial Data Using GPU Byoung-Woo Oh Department of Computer Engineering Kumoh National Institute of Technology Gumi, Korea bwoh@kumoh.ac.kr Abstract Spatial access methods
More informationCUDA Experiences: Over-Optimization and Future HPC
CUDA Experiences: Over-Optimization and Future HPC Carl Pearson 1, Simon Garcia De Gonzalo 2 Ph.D. candidates, Electrical and Computer Engineering 1 / Computer Science 2, University of Illinois Urbana-Champaign
More informationSplotch: High Performance Visualization using MPI, OpenMP and CUDA
Splotch: High Performance Visualization using MPI, OpenMP and CUDA Klaus Dolag (Munich University Observatory) Martin Reinecke (MPA, Garching) Claudio Gheller (CSCS, Switzerland), Marzia Rivi (CINECA,
More information