Fast Multipole and Related Algorithms
|
|
- Andrea Norman
- 5 years ago
- Views:
Transcription
1 Fast Multipole and Related Algorithms Ramani Duraiswami University of Maryland, College Park Joint work with Nail A. Gumerov
2 Efficiency by exploiting symmetry and A general paradigm structure Develop theory, algorithms and software that apply to general classes of data Recognize underlying mathematical symmetry Recognize structure in data Adapt algorithms to particular data How does this apply to the FMM?
3 FMM Fast summation of singular kernel functions Green s functions Φ(x,y) for classical operator L Solution y to forcing at points x Applications Electrostatics, Molecular/Stellar dynamics, Vortex Boundary Integral Methods Particle discretization of field equations
4 What is the FMM? Decompose singular sum in to local or sparse part + far-field and dense part
5 Factorization Trick apply to dense part Not FMM, but has some key ideas Consider v(y j )= N i=1 ij u i = N i=1 u i (x i y j ) 2 j=1,,m Naïve way to evaluate sum requires MN operations Instead can write the sum as v(y j )= ( N i=1 u i ) y j2 +( N i=1 u i x i2 ) 2y j ( N i=1 u i x i ) Can evaluate each bracketed sum over i =( N i=1 u i ), = ( N i=1 u i x i2 ), = ( N i=1 u i x i ) and evaluate v(y j )= y 2 j + 2y j Requires O(M+N) operations
6 Reduction of Complexity Straightforward (nested loops): Factorized: Complexity: O(MN) If p << min(m,n) then complexity reduces! Complexity: O(pN+pM)
7 Take this far-near decomposition and apply it recursively Far field and near field Take near-field and divide further Use translations Trees L2L M2M M2L W r1 (x W R * +t) x y 2 x x * +t t *2 r 1 R (R R) x *1* r W r (x * ) W 1 W r1 (x * +t) W r (x * ) y S W S 2 x *1 x * +t x (S S) t *2 W 1 x i r 1 r W r1 (x * +t) x * +t r y 1 (S R) S t S x (S S) *1 x *2 r W 1 x i
8 Data Structures Separate data into local and global Manage the factorizations Basic geometric construction --- WSPD Stuff inside one area, expanded as a series Translate series representation to other area Expand in the other area
9 Data Structures Need to automatically structure the source and target point sets Can t tile space automatically with nonoverlapping circles/spheres Classically, use quadtree/octrees A recent paper
10 Box and its neighbors
11 Data structures must support Separation: Cells satisfy local separation and WSPD properties across all levels. Indexing: Cell indices satisfy some order relation in memory. Spatial addressing: Cell centers are computable from addresses and each particle can find its bounding cell. Hierarchical addressing: Cell children, parent, and neighbor indices are computable from addresses. All this must be done in O(N log N) and must be parallelizable 1. Nail A. Gumerov, Ramani Duraiswami, and Y.A. Borovikov. Data structures, optimal choice of parameters, and complexity results for generalized fast multipole methods in d dimensions. CS/UMIACS UMIACS-TR , CS-TR Qi Hu, Nail Gumerov and Ramani Duraiswami. Parallel Algorithms for FMM Data Structures Yuancheng Luo and Ramani Duraiswami Alternative Tilings for the Fast Multipole Method on the Plane submitted. Available on arxiv. 2012
12 Direct method vs. FMM
13 Translations Representation size p 1 needs to be translated to another representation of size p 2 In general this is a O(p 1 p 2 ) computation Can reduce this cost by looking at symmetries and approximations of the matrix Coaxial and rotational symmetries exponential expansions and Integral Relations SVD based approximations Nail A. Gumerov and Ramani Duraiswami. A broadband fast multipole accelerated boundary element method for the three dimensional Helmholtz equation. Journal of the Acoustical Society of America, 125: , Nail A. Gumerov and Ramani Duraiswami. Comparison of the efficiency of translation operators used in the fast multipole method for the 3D Laplace equation. Technical Report CS-TR-4701 and UMIACS-TR , University of Maryland Nail A. Gumerov and Ramani Duraiswami. Recursions for the computation of multipole translation and rotation coefficients for the 3-D Helmholtz equation. SIAM Journal on Scientific Computing, 25(4): , 2003.
14 Helmholtz equation Issue: size of special function based factorizations grow with wavenumber times distance Means FMM is actually slower than direct multiplication at high frequencies! Rokhlin solution: diagonal factorization Our solution: level dependent truncation number; use of rotation based symmetries and local expansion with special functions; switch to diagonal representations later More numerically stable; easier to implement in BEM; cheaper
15 FMM on the GPU GPUs are extremely fast at doing computations which have high arithmetic intensity for each load/store, esp. MADD Do other computations, but not with as much speed-up A few cores Hundreds cores Control ALU ALU ALU ALU Cache DRAM CPU DRAM GPU NVIDIA Tesla C2050: 1.25 Tflops single 0.52 Tflops double 448 cores
16 GPU/heterogeneous FMM Moved to implement FMM on GPU almost with the introduction of CUDA (2007) FMM translation based on rotation translation, and showing that it is competitive or superior to asymptotically faster diagonal translations Real valued formulations Algorithm splitting Extension to heterogeneous architectures by mapping local sums to GPU (where they are most efficient) and tree based FMM to CPU
17 Time (s) Local summation on GPU (FMM) 1.E+02 1.E+01 Sparse Matrix-Vector Product s max =60 y=bx CPU: Time=CNs, s=8 -lmax N 1.E+00 CPU l max =5 y=cx GPU: 1.E-01 y=ax 2 4 Time=A 1 N+B 1 N/s+C 1 Ns 1.E-02 1.E GPU y=8 -lmax dx 2 +ex+8 lmax f b/c=16 read/write float computations 1.E-04 b/c=16 3D Laplace 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 Number of Sources access to box data These parameters depend on the hardware Nail A. Gumerov and Ramani Duraiswami. Fast multipole methods on graphical processors. Journal of Computational Physics, 227: , 2008.
18 FMM on heterogeneous architectures Insight --- why do gymnastics to fit tree algorithms with irregular access on the GPU GPUs hosted usually with a multicore CPU Let the GPU do local sum, rest on the CPU
19 Single node algorithm GPU work CPU work ODE solver: source receiver update particle positions, source strength data structure (octree and neighbors) source S-expansions translation stencils local direct sum upward S S downward S R R R receiver R-expansions time loop final sum of far-field and near-field interactions Task 3.5: Computational Considerations in Brownout Simulations 19
20 The algorithm flow chart Node A Node B positions, source strength positions, source strength ODE solver: source receiver update data structure (octree) merge octree data structure (neighbors) data structure (octree) merge octree data structure (neighbors) ODE solver: source receiver update redistributed particle redistributed particle single heterogeneous node algorithm single heterogeneous node algorithm exchange final R-expansions exchange final R-expansions final sum final sum Task 3.5: Computational Considerations in Brownout Simulations 20
21 Fast Gauss Transforms in high dimensions Sums of Gaussians arise in many applications in many dimensions Vortex methods; heat kernels; statistical density estimation; kernel methods in machine learning FMM data structures and factorizations do not work in higher dimensions Alternate global factorization; Data structures (based on k- center algorithm); and a automatically tuned algorithm developed in a series of papers Downloadable code from SourceForge (FigTree) Many papers applying this to vision, machine learning etc. Vikas C. Raykar and Ramani Duraiswami. The improved fast Gauss transform with applications to machine learning. In L. Bottou, O. Chapelle, D. Decoste, and J. Weston, editors, Large Scale Kernel Machines, pages MIT Press, 2007.
22 A few pretty pictures from papers 1 kd=0.96 (250 Hz) x z y p 4 x p 3 p 3 z y y p 3 z y z x x
23 Teaching the FMM Course notes since 2004 online Java Applet that demos the FMM 8 papers directly from the course enabled solidifying broad themes
GPU accelerated heterogeneous computing for Particle/FMM Approaches and for Acoustic Imaging
GPU accelerated heterogeneous computing for Particle/FMM Approaches and for Acoustic Imaging Ramani Duraiswami University of Maryland, College Park http://www.umiacs.umd.edu/~ramani With Nail A. Gumerov,
More informationCMSC 858M/AMSC 698R. Fast Multipole Methods. Nail A. Gumerov & Ramani Duraiswami. Lecture 20. Outline
CMSC 858M/AMSC 698R Fast Multipole Methods Nail A. Gumerov & Ramani Duraiswami Lecture 20 Outline Two parts of the FMM Data Structures FMM Cost/Optimization on CPU Fine Grain Parallelization for Multicore
More informationFMM implementation on CPU and GPU. Nail A. Gumerov (Lecture for CMSC 828E)
FMM implementation on CPU and GPU Nail A. Gumerov (Lecture for CMSC 828E) Outline Two parts of the FMM Data Structure Flow Chart of the Run Algorithm FMM Cost/Optimization on CPU Programming on GPU Fast
More informationIterative methods for use with the Fast Multipole Method
Iterative methods for use with the Fast Multipole Method Ramani Duraiswami Perceptual Interfaces and Reality Lab. Computer Science & UMIACS University of Maryland, College Park, MD Joint work with Nail
More informationTerascale on the desktop: Fast Multipole Methods on Graphical Processors
Terascale on the desktop: Fast Multipole Methods on Graphical Processors Nail A. Gumerov Fantalgo, LLC Institute for Advanced Computer Studies University of Maryland (joint work with Ramani Duraiswami)
More informationFMM Data Structures. Content. Introduction Hierarchical Space Subdivision with 2 d -Trees Hierarchical Indexing System Parent & Children Finding
FMM Data Structures Nail Gumerov & Ramani Duraiswami UMIACS [gumerov][ramani]@umiacs.umd.edu CSCAMM FAM4: 4/9/4 Duraiswami & Gumerov, -4 Content Introduction Hierarchical Space Subdivision with d -Trees
More informationA Kernel-independent Adaptive Fast Multipole Method
A Kernel-independent Adaptive Fast Multipole Method Lexing Ying Caltech Joint work with George Biros and Denis Zorin Problem Statement Given G an elliptic PDE kernel, e.g. {x i } points in {φ i } charges
More informationEfficient O(N log N) algorithms for scattered data interpolation
Efficient O(N log N) algorithms for scattered data interpolation Nail Gumerov University of Maryland Institute for Advanced Computer Studies Joint work with Ramani Duraiswami February Fourier Talks 2007
More informationFMM accelerated BEM for 3D Helmholtz equation
FMM accelerated BEM for 3D Helmholtz equation Nail A. Gumerov and Ramani Duraiswami Institute for Advanced Computer Studies University of Maryland, U.S.A. also @ Fantalgo, LLC, U.S.A. www.umiacs.umd.edu/~gumerov
More informationGPUML: Graphical processors for speeding up kernel machines
GPUML: Graphical processors for speeding up kernel machines http://www.umiacs.umd.edu/~balajiv/gpuml.htm Balaji Vasan Srinivasan, Qi Hu, Ramani Duraiswami Department of Computer Science, University of
More informationFast Multipole Method on the GPU
Fast Multipole Method on the GPU with application to the Adaptive Vortex Method University of Bristol, Bristol, United Kingdom. 1 Introduction Particle methods Highly parallel Computational intensive Numerical
More informationScalable Fast Multipole Methods on Distributed Heterogeneous Architectures
Scalable Fast Multipole Methods on Distributed Heterogeneous Architectures Qi Hu huqi@cs.umd.edu Nail A. Gumerov gumerov@umiacs.umd.edu Ramani Duraiswami ramani@umiacs.umd.edu Institute for Advanced Computer
More informationFast Multipole Methods. Linear Systems. Matrix vector product. An Introduction to Fast Multipole Methods.
An Introduction to Fast Multipole Methods Ramani Duraiswami Institute for Advanced Computer Studies University of Maryland, College Park http://www.umiacs.umd.edu/~ramani Joint work with Nail A. Gumerov
More informationFast Multipole Accelerated Indirect Boundary Elements for the Helmholtz Equation
Fast Multipole Accelerated Indirect Boundary Elements for the Helmholtz Equation Nail A. Gumerov Ross Adelman Ramani Duraiswami University of Maryland Institute for Advanced Computer Studies and Fantalgo,
More informationLow-rank Properties, Tree Structure, and Recursive Algorithms with Applications. Jingfang Huang Department of Mathematics UNC at Chapel Hill
Low-rank Properties, Tree Structure, and Recursive Algorithms with Applications Jingfang Huang Department of Mathematics UNC at Chapel Hill Fundamentals of Fast Multipole (type) Method Fundamentals: Low
More informationScientific Computing on Graphical Processors: FMM, Flagon, Signal Processing, Plasma and Astrophysics
Scientific Computing on Graphical Processors: FMM, Flagon, Signal Processing, Plasma and Astrophysics Ramani Duraiswami Computer Science & UMIACS University of Maryland, College Park Joint work with Nail
More informationCenter for Computational Science
Center for Computational Science Toward GPU-accelerated meshfree fluids simulation using the fast multipole method Lorena A Barba Boston University Department of Mechanical Engineering with: Felipe Cruz,
More informationThe Fast Multipole Method on NVIDIA GPUs and Multicore Processors
The Fast Multipole Method on NVIDIA GPUs and Multicore Processors Toru Takahashi, a Cris Cecka, b Eric Darve c a b c Department of Mechanical Science and Engineering, Nagoya University Institute for Applied
More informationScalable Distributed Fast Multipole Methods
Scalable Distributed Fast Multipole Methods Qi Hu, Nail A. Gumerov, Ramani Duraiswami University of Maryland Institute for Advanced Computer Studies (UMIACS) Department of Computer Science, University
More informationFast multipole methods for axisymmetric geometries
Fast multipole methods for axisymmetric geometries Victor Churchill New York University May 13, 2016 Abstract Fast multipole methods (FMMs) are one of the main numerical methods used in the solution of
More informationExaFMM. Fast multipole method software aiming for exascale systems. User's Manual. Rio Yokota, L. A. Barba. November Revision 1
ExaFMM Fast multipole method software aiming for exascale systems User's Manual Rio Yokota, L. A. Barba November 2011 --- Revision 1 ExaFMM User's Manual i Revision History Name Date Notes Rio Yokota,
More informationUsing GPUs to compute the multilevel summation of electrostatic forces
Using GPUs to compute the multilevel summation of electrostatic forces David J. Hardy Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of
More informationKernel Independent FMM
Kernel Independent FMM FMM Issues FMM requires analytical work to generate S expansions, R expansions, S S (M2M) translations S R (M2L) translations R R (L2L) translations Such analytical work leads to
More informationTree-based methods on GPUs
Tree-based methods on GPUs Felipe Cruz 1 and Matthew Knepley 2,3 1 Department of Mathematics University of Bristol 2 Computation Institute University of Chicago 3 Department of Molecular Biology and Physiology
More informationIntermediate Parallel Programming & Cluster Computing
High Performance Computing Modernization Program (HPCMP) Summer 2011 Puerto Rico Workshop on Intermediate Parallel Programming & Cluster Computing in conjunction with the National Computational Science
More information3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs
3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs H. Knibbe, C. W. Oosterlee, C. Vuik Abstract We are focusing on an iterative solver for the three-dimensional
More informationFMM CMSC 878R/AMSC 698R. Lecture 13
FMM CMSC 878R/AMSC 698R Lecture 13 Outline Results of the MLFMM tests Itemized Asymptotic Complexity of the MLFMM; Optimization of the Grouping (Clustering) Parameter; Regular mesh; Random distributions.
More informationEmpirical Testing of Fast Kernel Density Estimation Algorithms
Empirical Testing of Fast Kernel Density Estimation Algorithms Dustin Lang Mike Klaas { dalang, klaas, nando }@cs.ubc.ca Dept. of Computer Science University of British Columbia Vancouver, BC, Canada V6T1Z4
More informationThe Fast Multipole Method (FMM)
The Fast Multipole Method (FMM) Motivation for FMM Computational Physics Problems involving mutual interactions of N particles Gravitational or Electrostatic forces Collective (but weak) long-range forces
More informationStudy and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou
Study and implementation of computational methods for Differential Equations in heterogeneous systems Asimina Vouronikoy - Eleni Zisiou Outline Introduction Review of related work Cyclic Reduction Algorithm
More informationFast Methods with Sieve
Fast Methods with Sieve Matthew G Knepley Mathematics and Computer Science Division Argonne National Laboratory August 12, 2008 Workshop on Scientific Computing Simula Research, Oslo, Norway M. Knepley
More informationFast Multipole Method on GPU: Tackling 3-D Capacitance Extraction on Massively Parallel SIMD Platforms
Fast Multipole Method on GPU: Tackling 3-D Capacitance Extraction on Massively Parallel SIMD Platforms Xueqian Zhao Department of ECE Michigan Technological University Houghton, MI, 4993 xueqianz@mtu.edu
More informationc 2007 Society for Industrial and Applied Mathematics
SIAM J. SCI. COMPUT. Vol. 29, No. 5, pp. 1876 1899 c 2007 Society for Industrial and Applied Mathematics FAST RADIAL BASIS FUNCTION INTERPOLATION VIA PRECONDITIONED KRYLOV ITERATION NAIL A. GUMEROV AND
More informationFast Spherical Filtering in the Broadband FMBEM using a nonequally
Fast Spherical Filtering in the Broadband FMBEM using a nonequally spaced FFT Daniel R. Wilkes (1) and Alec. J. Duncan (1) (1) Centre for Marine Science and Technology, Department of Imaging and Applied
More informationDi Zhao Ohio State University MVAPICH User Group (MUG) Meeting, August , Columbus Ohio
Di Zhao zhao.1029@osu.edu Ohio State University MVAPICH User Group (MUG) Meeting, August 26-27 2013, Columbus Ohio Nvidia Kepler K20X Intel Xeon Phi 7120 Launch Date November 2012 Q2 2013 Processor Per-processor
More informationMulti-Domain Pattern. I. Problem. II. Driving Forces. III. Solution
Multi-Domain Pattern I. Problem The problem represents computations characterized by an underlying system of mathematical equations, often simulating behaviors of physical objects through discrete time
More informationHPC Algorithms and Applications
HPC Algorithms and Applications Dwarf #5 Structured Grids Michael Bader Winter 2012/2013 Dwarf #5 Structured Grids, Winter 2012/2013 1 Dwarf #5 Structured Grids 1. dense linear algebra 2. sparse linear
More informationCUDA Experiences: Over-Optimization and Future HPC
CUDA Experiences: Over-Optimization and Future HPC Carl Pearson 1, Simon Garcia De Gonzalo 2 Ph.D. candidates, Electrical and Computer Engineering 1 / Computer Science 2, University of Illinois Urbana-Champaign
More informationGPU Implementation of Implicit Runge-Kutta Methods
GPU Implementation of Implicit Runge-Kutta Methods Navchetan Awasthi, Abhijith J Supercomputer Education and Research Centre Indian Institute of Science, Bangalore, India navchetanawasthi@gmail.com, abhijith31792@gmail.com
More informationReconstruction of Trees from Laser Scan Data and further Simulation Topics
Reconstruction of Trees from Laser Scan Data and further Simulation Topics Helmholtz-Research Center, Munich Daniel Ritter http://www10.informatik.uni-erlangen.de Overview 1. Introduction of the Chair
More informationAn Efficient CUDA Implementation of a Tree-Based N-Body Algorithm. Martin Burtscher Department of Computer Science Texas State University-San Marcos
An Efficient CUDA Implementation of a Tree-Based N-Body Algorithm Martin Burtscher Department of Computer Science Texas State University-San Marcos Mapping Regular Code to GPUs Regular codes Operate on
More informationComputational Fluid Dynamics (CFD) using Graphics Processing Units
Computational Fluid Dynamics (CFD) using Graphics Processing Units Aaron F. Shinn Mechanical Science and Engineering Dept., UIUC Accelerators for Science and Engineering Applications: GPUs and Multicores
More informationAccelerating Octo-Tiger: Stellar Mergers on Intel Knights Landing with HPX
Accelerating Octo-Tiger: Stellar Mergers on Intel Knights Landing with HPX David Pfander*, Gregor Daiß*, Dominic Marcello**, Hartmut Kaiser**, Dirk Pflüger* * University of Stuttgart ** Louisiana State
More informationEmpirical Comparisons of Fast Methods
Empirical Comparisons of Fast Methods Dustin Lang and Mike Klaas {dalang, klaas}@cs.ubc.ca University of British Columbia December 17, 2004 Fast N-Body Learning - Empirical Comparisons p. 1 Sum Kernel
More informationThe Fast Multipole Method and the Radiosity Kernel
and the Radiosity Kernel The Fast Multipole Method Sharat Chandran http://www.cse.iitb.ac.in/ sharat January 8, 2006 Page 1 of 43 (Joint work with Alap Karapurkar and Nitin Goel) 1 Copyright c 2005 Sharat
More informationPhD Student. Associate Professor, Co-Director, Center for Computational Earth and Environmental Science. Abdulrahman Manea.
Abdulrahman Manea PhD Student Hamdi Tchelepi Associate Professor, Co-Director, Center for Computational Earth and Environmental Science Energy Resources Engineering Department School of Earth Sciences
More informationAccelerating GPU computation through mixed-precision methods. Michael Clark Harvard-Smithsonian Center for Astrophysics Harvard University
Accelerating GPU computation through mixed-precision methods Michael Clark Harvard-Smithsonian Center for Astrophysics Harvard University Outline Motivation Truncated Precision using CUDA Solving Linear
More informationA GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang
A GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang University of Massachusetts Amherst Introduction Singular Value Decomposition (SVD) A: m n matrix (m n) U, V: orthogonal
More informationEfficient Tridiagonal Solvers for ADI methods and Fluid Simulation
Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Nikolai Sakharnykh - NVIDIA San Jose Convention Center, San Jose, CA September 21, 2010 Introduction Tridiagonal solvers very popular
More informationAccelerated flow acoustic boundary element solver and the noise generation of fish
Accelerated flow acoustic boundary element solver and the noise generation of fish JUSTIN W. JAWORSKI, NATHAN WAGENHOFFER, KEITH W. MOORED LEHIGH UNIVERSITY, BETHLEHEM, USA FLINOVIA PENN STATE 27 APRIL
More informationPortable Heterogeneous High-Performance Computing via Domain-Specific Virtualization. Dmitry I. Lyakh.
Portable Heterogeneous High-Performance Computing via Domain-Specific Virtualization Dmitry I. Lyakh liakhdi@ornl.gov This research used resources of the Oak Ridge Leadership Computing Facility at the
More informationCauchy Fast Multipole Method for Analytic Kernel
Cauchy Fast Multipole Method for Analytic Kernel Pierre-David Létourneau 1 Cristopher Cecka 2 Eric Darve 1 1 Stanford University 2 Harvard University ICIAM July 2011 Pierre-David Létourneau Cristopher
More informationHierarchical Low-Rank Approximation Methods on Distributed Memory and GPUs
JH160041-NAHI Hierarchical Low-Rank Approximation Methods on Distributed Memory and GPUs Rio Yokota(Tokyo Institute of Technology) Hierarchical low-rank approximation methods such as H-matrix, H2-matrix,
More informationWhy Use the GPU? How to Exploit? New Hardware Features. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. Semiconductor trends
Imagine stream processor; Bill Dally, Stanford Connection Machine CM; Thinking Machines Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid Jeffrey Bolz Eitan Grinspun Caltech Ian Farmer
More informationParticle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA
Particle-in-Cell Simulations on Modern Computing Platforms Viktor K. Decyk and Tajendra V. Singh UCLA Outline of Presentation Abstraction of future computer hardware PIC on GPUs OpenCL and Cuda Fortran
More informationOptimisation Myths and Facts as Seen in Statistical Physics
Optimisation Myths and Facts as Seen in Statistical Physics Massimo Bernaschi Institute for Applied Computing National Research Council & Computer Science Department University La Sapienza Rome - ITALY
More information8. Hardware-Aware Numerics. Approaching supercomputing...
Approaching supercomputing... Numerisches Programmieren, Hans-Joachim Bungartz page 1 of 48 8.1. Hardware-Awareness Introduction Since numerical algorithms are ubiquitous, they have to run on a broad spectrum
More informationComparison of different solvers for two-dimensional steady heat conduction equation ME 412 Project 2
Comparison of different solvers for two-dimensional steady heat conduction equation ME 412 Project 2 Jingwei Zhu March 19, 2014 Instructor: Surya Pratap Vanka 1 Project Description The purpose of this
More information8. Hardware-Aware Numerics. Approaching supercomputing...
Approaching supercomputing... Numerisches Programmieren, Hans-Joachim Bungartz page 1 of 22 8.1. Hardware-Awareness Introduction Since numerical algorithms are ubiquitous, they have to run on a broad spectrum
More informationExploiting Multiple GPUs in Sparse QR: Regular Numerics with Irregular Data Movement
Exploiting Multiple GPUs in Sparse QR: Regular Numerics with Irregular Data Movement Tim Davis (Texas A&M University) with Sanjay Ranka, Mohamed Gadou (University of Florida) Nuri Yeralan (Microsoft) NVIDIA
More informationCenter for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop
Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop http://cscads.rice.edu/ Discussion and Feedback CScADS Autotuning 07 Top Priority Questions for Discussion
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationParallel Implementations of Gaussian Elimination
s of Western Michigan University vasilije.perovic@wmich.edu January 27, 2012 CS 6260: in Parallel Linear systems of equations General form of a linear system of equations is given by a 11 x 1 + + a 1n
More informationA Broadband Fast Multipole Accelerated Boundary Element Method for the 3D Helmholtz Equation. Abstract
A Broadband Fast Multipole Accelerated Boundary Element Method for the 3D Helmholtz Equation Nail A. Gumerov and Ramani Duraiswami (Dated: 31 December 2007, Revised 22 July 2008, Revised 03 October 2008.)
More informationMultigrid Pattern. I. Problem. II. Driving Forces. III. Solution
Multigrid Pattern I. Problem Problem domain is decomposed into a set of geometric grids, where each element participates in a local computation followed by data exchanges with adjacent neighbors. The grids
More informationParallel multi-frontal solver for isogeometric finite element methods on GPU
Parallel multi-frontal solver for isogeometric finite element methods on GPU Maciej Paszyński Department of Computer Science, AGH University of Science and Technology, Kraków, Poland email: paszynsk@agh.edu.pl
More informationX10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management
X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management Hideyuki Shamoto, Tatsuhiro Chiba, Mikio Takeuchi Tokyo Institute of Technology IBM Research Tokyo Programming for large
More informationFAST MATRIX-VECTOR PRODUCT BASED FGMRES FOR KERNEL MACHINES
FAST MATRIX-VECTOR PRODUCT BASED FGMRES FOR KERNEL MACHINES BALAJI VASAN SRINIVASAN, M.S. SCHOLARLY PAPER DEPARTMENT OF COMPUTER SCIENCE, UNIVERSITY OF MARYLAND, COLLEGE PARK Abstract. Kernel based approaches
More informationGPU-based Distributed Behavior Models with CUDA
GPU-based Distributed Behavior Models with CUDA Courtesy: YouTube, ISIS Lab, Universita degli Studi di Salerno Bradly Alicea Introduction Flocking: Reynolds boids algorithm. * models simple local behaviors
More informationAn Auto-Tuning Framework for Parallel Multicore Stencil Computations
Software Engineering Seminar Sebastian Hafen An Auto-Tuning Framework for Parallel Multicore Stencil Computations Shoaib Kamil, Cy Chan, Leonid Oliker, John Shalf, Samuel Williams 1 Stencils What is a
More informationOn the limits of (and opportunities for?) GPU acceleration
On the limits of (and opportunities for?) GPU acceleration Aparna Chandramowlishwaran, Jee Choi, Kenneth Czechowski, Murat (Efe) Guney, Logan Moon, Aashay Shringarpure, Richard (Rich) Vuduc HotPar 10,
More informationA TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE
A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE Abu Asaduzzaman, Deepthi Gummadi, and Chok M. Yip Department of Electrical Engineering and Computer Science Wichita State University Wichita, Kansas, USA
More informationRobot Mapping. Least Squares Approach to SLAM. Cyrill Stachniss
Robot Mapping Least Squares Approach to SLAM Cyrill Stachniss 1 Three Main SLAM Paradigms Kalman filter Particle filter Graphbased least squares approach to SLAM 2 Least Squares in General Approach for
More informationGraphbased. Kalman filter. Particle filter. Three Main SLAM Paradigms. Robot Mapping. Least Squares Approach to SLAM. Least Squares in General
Robot Mapping Three Main SLAM Paradigms Least Squares Approach to SLAM Kalman filter Particle filter Graphbased Cyrill Stachniss least squares approach to SLAM 1 2 Least Squares in General! Approach for
More informationEXPOSING PARTICLE PARALLELISM IN THE XGC PIC CODE BY EXPLOITING GPU MEMORY HIERARCHY. Stephen Abbott, March
EXPOSING PARTICLE PARALLELISM IN THE XGC PIC CODE BY EXPLOITING GPU MEMORY HIERARCHY Stephen Abbott, March 26 2018 ACKNOWLEDGEMENTS Collaborators: Oak Ridge Nation Laboratory- Ed D Azevedo NVIDIA - Peng
More informationGPU-Accelerated Parallel Sparse LU Factorization Method for Fast Circuit Analysis
GPU-Accelerated Parallel Sparse LU Factorization Method for Fast Circuit Analysis Abstract: Lower upper (LU) factorization for sparse matrices is the most important computing step for circuit simulation
More informationThe success of multipole methods: Are we there yet?
King Abdullah University of Science and Technology Saudi Arabia, Apr. 2830, 2012 The success of multipole methods: Are we there yet? Lorena A Barba, Boston University 1948 The world s first computation
More informationSpatial Data Structures
CSCI 420 Computer Graphics Lecture 17 Spatial Data Structures Jernej Barbic University of Southern California Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees [Angel Ch. 8] 1 Ray Tracing Acceleration
More informationAccelerating Molecular Modeling Applications with Graphics Processors
Accelerating Molecular Modeling Applications with Graphics Processors John Stone Theoretical and Computational Biophysics Group University of Illinois at Urbana-Champaign Research/gpu/ SIAM Conference
More informationScaling Fast Multipole Methods up to 4000 GPUs
Scaling Fast Multipole Methods up to 4000 GPUs Rio Yokota King Abdullah University of Science and Technology 4700 KAUST, Thuwal 23955-6900, Saudi Arabia +966-2-808-0321 rio.yokota@kaust.edu.sa Lorena Barba
More informationImplementation of rotation-based operators for Fast Multipole Method in X10 Andrew Haigh
School of Computer Science College of Engineering and Computer Science Implementation of rotation-based operators for Fast Multipole Method in X10 Andrew Haigh Supervisors: Dr Eric McCreath Josh Milthorpe
More information2006: Short-Range Molecular Dynamics on GPU. San Jose, CA September 22, 2010 Peng Wang, NVIDIA
2006: Short-Range Molecular Dynamics on GPU San Jose, CA September 22, 2010 Peng Wang, NVIDIA Overview The LAMMPS molecular dynamics (MD) code Cell-list generation and force calculation Algorithm & performance
More informationDIMENSION REDUCTION FOR HYPERSPECTRAL DATA USING RANDOMIZED PCA AND LAPLACIAN EIGENMAPS
DIMENSION REDUCTION FOR HYPERSPECTRAL DATA USING RANDOMIZED PCA AND LAPLACIAN EIGENMAPS YIRAN LI APPLIED MATHEMATICS, STATISTICS AND SCIENTIFIC COMPUTING ADVISOR: DR. WOJTEK CZAJA, DR. JOHN BENEDETTO DEPARTMENT
More informationHigh performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli
High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering
More informationT6: Position-Based Simulation Methods in Computer Graphics. Jan Bender Miles Macklin Matthias Müller
T6: Position-Based Simulation Methods in Computer Graphics Jan Bender Miles Macklin Matthias Müller Jan Bender Organizer Professor at the Visual Computing Institute at Aachen University Research topics
More informationOptimizing the multipole-to-local operator in the fast multipole method for graphical processing units
INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING Int. J. Numer. Meth. Engng 2012; 89:105 133 Published online 3 August 2011 in Wiley Online Library (wileyonlinelibrary.com)..3240 Optimizing the
More informationThe Immersed Interface Method
The Immersed Interface Method Numerical Solutions of PDEs Involving Interfaces and Irregular Domains Zhiiin Li Kazufumi Ito North Carolina State University Raleigh, North Carolina Society for Industrial
More informationPanel methods are currently capable of rapidly solving the potential flow equation on rather complex
A Fast, Unstructured Panel Solver John Moore 8.337 Final Project, Fall, 202 A parallel high-order Boundary Element Method accelerated by the Fast Multipole Method is presented in this report. The case
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationSpatial Data Structures
CSCI 480 Computer Graphics Lecture 7 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids BSP Trees [Ch. 0.] March 8, 0 Jernej Barbic University of Southern California http://www-bcf.usc.edu/~jbarbic/cs480-s/
More informationA comparison of parallel rank-structured solvers
A comparison of parallel rank-structured solvers François-Henry Rouet Livermore Software Technology Corporation, Lawrence Berkeley National Laboratory Joint work with: - LSTC: J. Anton, C. Ashcraft, C.
More informationAccelerating Polynomial Homotopy Continuation on a Graphics Processing Unit with Double Double and Quad Double Arithmetic
Accelerating Polynomial Homotopy Continuation on a Graphics Processing Unit with Double Double and Quad Double Arithmetic Jan Verschelde joint work with Xiangcheng Yu University of Illinois at Chicago
More informationThe meshfree computation of stationary electric current densities in complex shaped conductors using 3D boundary element methods
Boundary Elements and Other Mesh Reduction Methods XXXVII 121 The meshfree computation of stationary electric current densities in complex shaped conductors using 3D boundary element methods A. Buchau
More informationCS560 Lecture Parallel Architecture 1
Parallel Architecture Announcements The RamCT merge is done! Please repost introductions. Manaf s office hours HW0 is due tomorrow night, please try RamCT submission HW1 has been posted Today Isoefficiency
More informationAutomatic online tuning for fast Gaussian summation
Automatic online tuning for fast Gaussian summation Vlad I. Morariu 1, Balaji V. Srinivasan 1, Vikas C. Raykar 2, Ramani Duraiswami 1, and Larry S. Davis 1 1 University of Maryland, College Park, MD 20742
More informationHow to Optimize Geometric Multigrid Methods on GPUs
How to Optimize Geometric Multigrid Methods on GPUs Markus Stürmer, Harald Köstler, Ulrich Rüde System Simulation Group University Erlangen March 31st 2011 at Copper Schedule motivation imaging in gradient
More informationOpenACC programming for GPGPUs: Rotor wake simulation
DLR.de Chart 1 OpenACC programming for GPGPUs: Rotor wake simulation Melven Röhrig-Zöllner, Achim Basermann Simulations- und Softwaretechnik DLR.de Chart 2 Outline Hardware-Architecture (CPU+GPU) GPU computing
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationTERM PAPER ON The Compressive Sensing Based on Biorthogonal Wavelet Basis
TERM PAPER ON The Compressive Sensing Based on Biorthogonal Wavelet Basis Submitted By: Amrita Mishra 11104163 Manoj C 11104059 Under the Guidance of Dr. Sumana Gupta Professor Department of Electrical
More information2D vector fields 3. Contents. Line Integral Convolution (LIC) Image based flow visualization Vector field topology. Fast LIC Oriented LIC
2D vector fields 3 Scientific Visualization (Part 8) PD Dr.-Ing. Peter Hastreiter Contents Line Integral Convolution (LIC) Fast LIC Oriented LIC Image based flow visualization Vector field topology 2 Applied
More information