A Parallel Implementation of the 3D NUFFT on Distributed-Memory Systems
|
|
- Job Parrish
- 5 years ago
- Views:
Transcription
1 A Parallel Implementation of the 3D NUFFT on Distributed-Memory Systems Yuanxun Bill Bao May 31, Introduction The non-uniform fast Fourier transform (NUFFT) algorithm was originally introduced by Dutt and Rohlin [1] to generalize the FFT algorithm to nonequispaced data on the interval [ π, π]. In d dimensions, the NUFFT algorithm can achieve a complexity of O ( M d log M + N(log 1 ɛ )d), where ɛ is the precision of computation, M is the number of Fourier modes in each dimension and N is the total number data points. The NUFFT algorithm arises in a variety of application and we refer the reader to the discussions in [2, 3]. In this report, we focus on the parallel implementation of the NUFFT on large distributed memory systems. We note that there has been recent developments on parallelizing the NUFFT on massively parallel distributed-memory systems: the PNFFT library [5]. Different from the PNFFT library which is based on the NFFT library in C, our implementation is based on the P3DFFT library [4] in Fortran. Due to the time constraint and scope of the project, our implementation is not yet optimized for performance and is restricted to the transform from the non-uniform physical domain to the uniform frequency domain (Type-1 transform). 2 The NUFFT In 3D, the type-1 NUFFT is mainly concerned with evaluating the sum: F (k 1, k 2, k 3 ) = 1 N N 1 f j e i (k 1,k 2,k 3 ) x j, (2.1) j=0 1
2 where {x j } N j=1 are non-uniformly distributed sources in the domain [ π, π]3, k i { M 2,... M 2 1}, i = 1, 2, 3, and the strength of the source x j is f j = f (x j ). Note that a direct evaluation of the sum (2.1) would result in a total number of O(NM 3 ) operations. Typically, when N M, direct evaluation of the sum is computationally intractable. The 1D NUFFT algorithm can be summarized in three steps: 1. Gridding: for each source x j, spread the strength f j to its nearby M s oversampled regular grid points in both directions by convolving with a gaussian function. The reason we use the gaussian function is that it can be written in terms of a tensor product in higher dimensions. The number of regular oversampled grid points is typically set to be M r = 2M. To be more specific, the contribution due to source x j to the target m is f τ (2π/M r (m + m )) = f j e (x 2π(m+m )/Mr) 2 /4τ, (2.2) where 2πm/M r is the nearest regular grid point of the source x j and M s < m M s. 2. FFT: take the FFT of f τ and get F τ (k). π 3. Deconvolution: F (k) = τ ek2τ F τ (k). In practice, for ɛ = 10 12, we set M s = 12 and τ = 12/M 2. The 1D NUFFT algorithm can be easily generalized to higher dimensions. In d dimensions, the gridding step takes O(24 d N) exponential evaluations, the FFT step takes O(M d log M) operations and the deconvolution step takes O(M 3 ) multiplications. When N M and M is large, the runtime of a sequential NUFFT becomes quite expensive. 3 Distributed Memory Parallelism We choose a distributed-memory parallelism for the NUFFT, since step 1 and 3 are localized and there are existing libraries (FFTW, P3DFFT) to compute the FFT on distributed memory systems. In order to run the NUFFT on massively parallel distributed memory systems, for example, the Stampede, we employ a 2D domain decomposition (pencil-shaped) approach. To be more precise, the x-direction of the computational domain is local in a processor, while the y- and z-direction of the domain are distributed among a 2D grid 2
3 Figure 1: An illustration of a 4 4 2D grid of processors. Each processor has 8 neighbors. of processors (Figure 1). Each processor is responsible for a pencil-shaped chunk of the computational domain (Figure 2). For each processor, we loop through all the sources and perform the gridding step. Inter-processor communication is necessary when nearby regular grid points of a source lie outside the computational domain of that processor (Figure 3a). To carry out this procedure efficiently, we extend the local computational domain of each processor to include a halo of ghost arrays. Therefore, gridding can be done locally first, and then, we send the ghost arrays to the corresponding neighboring processors (Figure 3b). Once every processor has completed ghost array exchanges, we call the parallel version of FFT provided by the P3DFFT library. The deconvolution step can be done locally within each processor. This completes the description of our parallel implementation of the NUFFT algorithm. We discuss the details of inter-processor communications next. 3
4 Figure 2: The 2D domain decomposition of the computational domain (pencil-shaped). (a) (b) Figure 3: (a) An illustration of a source point whose neighboring regular grid points lie outside the computational domain of a processor. (b) An illustration of ghost arrays being sent to their corresponding neighboring processors. 4
5 4 Inter-Processor Communications We are now ready to discuss how inter-processor communications are carried out in our implementation. After each processor completes the gridding step, we need to send the corresponding ghost arrays to its eight neighbors. The order of which ghost arrays are transferred among processors is designed to avoid any hang in the runtime. As an illustration, in Figure 5, we divide the 2D processor grid into two groups: odd-row and even-row processors. We demonstrate how North-South communications are carried out. First, then even-row processors send data to their North neighbors (MPI Send) and wait to receive from their North neighbors (MPI Recv). Meanwhile, the oddrow processors receive data from their South neighbors and send data to the South (Figure 4a). Next, the odd-row processors send data to the North and wait to receive from the North, and the even-row processors receive data from the South and send data to the South (Figure 4b). This completes all North-South data exchanges. The E-W, NE-SW, SE-NW communications can be carried out in a similar fashion. We note that the implementation of inter-processor communication discussed here is not optimal but it effectively avoids hang in runtime. (a) (b) Figure 4: An illustration of North-South data exchange among even- and odd-row processors. 5
6 5 Results We discuss the strong and weak scaling of our parallel implementation of the 3D NUFFT algorithm. For strong scaling, we consider M = 1024 and N = billion sources in [0, 2π] 3. The oversampled grid resolution is set to be 2M. Other parameters are ɛ = 10 12, M s = 12 and τ = 12/M 2. We run on the Stampede with 512, 1024, 2048 and 4096 processors. In Figure 5a, we plot the average time per processor for the total computation, the gridding step, the MPI communication and the FFT versus the number of processors on a log-log scale. First, we notice that cost of the algorithm is dominated by the gridding step. As the number of processors doubles, the total computation time and the gridding time is halved, which shows the strong scaling of our implementation. The MPI communication time, though playing a minor role in terms of cost, also scales strongly. The reason why FFT does not scale strongly is that, after dividing the domain into pencilshaped arrays, they are too small for P3DFFT to show strong scaling. As a remark, we believe that our implementation will continue to scale strongly if more processors can be requested (the maximum normal queue size is 4096 on Stampede). It is worth mentioning that, if the same input data were to run on a single processor, not to even mention the data would fit into the memory, it would take almost 2 days as compared to 40 seconds for 4096 processors. For weak scaling, we keep the work load of each processor the same. For our current implementation, we can only compare input data differed by a factor of 8. We compare 512 vs 4096 processors with sources per processor, and 256 vs 2048 with sources per processor. Figure 5b shows that the time per processor required to do each task is almost the same for 512 vs 4096 processors, and 256 vs 2048 processors, which shows the weak scaling of our implementation. 6 Conclusion In this project, we present a parallel implementation of the 3D NUFFT algorithm on distributed memory systems. Our implementation features a 2D domain decomposition approach, and is able to scale both weakly and strongly on a large distributed memory system (eg. Stampede). Future work includes optimization on memory access and data storage, implementing the type-2, 3 transform, and comparing to a GPU implementation. 6
7 (a) (b) Figure 5: (a) Strong scaling of our parallel implementation of 3D NUFFT. (b) Weak scaling of our parallel implementation of 3D NUFFT. 7
8 References [1] A. Dutt and V. Rokhlin. Fast Fourier transforms for nonequispaced data. SIAM J. Sci. Comput., 14(6): , [2] Leslie Greengard and June-Yub Lee. Accelerating the nonuniform fast Fourier transform. SIAM Rev., 46(3): , [3] June-Yub Lee and Leslie Greengard. The type 3 nonuniform FFT and its applications. J. Comput. Phys., 206(1):1 5, [4] Dmitry Pekurovsky. P3DFFT: a framework for parallel computations of Fourier transforms in three dimensions. SIAM J. Sci. Comput., 34(4):C192 C209, [5] Michael Pippig and Daniel Potts. Parallel three-dimensional nonequispaced fast Fourier transforms and their application to particle simulation. SIAM J. Sci. Comput., 35(4):C411 C437,
Mesh Generation. Quadtrees. Geometric Algorithms. Lecture 9: Quadtrees
Lecture 9: Lecture 9: VLSI Design To Lecture 9: Finite Element Method To http://www.antics1.demon.co.uk/finelms.html Lecture 9: To Lecture 9: To component not conforming doesn t respect input not well-shaped
More informationScalable Parallelization Strategies to Accelerate NuFFT Data Translation on Multicores
Scalable Parallelization Strategies to Accelerate NuFFT Data Translation on Multicores Yuanrui Zhang,JunLiu,EmreKultursay, Mahmut Kandemir, Nikos Pitsianis 2,3, and Xiaobai Sun 3 Pennsylvania State University,
More information1. Meshes. D7013E Lecture 14
D7013E Lecture 14 Quadtrees Mesh Generation 1. Meshes Input: Components in the form of disjoint polygonal objects Integer coordinates, 0, 45, 90, or 135 angles Output: A triangular mesh Conforming: A triangle
More informationFive Dimensional Interpolation:exploring different Fourier operators
Five Dimensional Interpolation:exploring different Fourier operators Daniel Trad CREWES-University of Calgary Summary Five-Dimensional interpolation has become a very popular method to pre-condition data
More informationA Fast Decimation-in-image Back-projection Algorithm for SAR
A Fast Decimation-in-image Back-projection Algorithm for SAR Shaun I. Kelly and Mike E. Davies Institute for Digital Communications The University of Edinburgh email: {Shaun.Kelly, Mike.Davies}@ed.ac.uk
More informationEfficient O(N log N) algorithms for scattered data interpolation
Efficient O(N log N) algorithms for scattered data interpolation Nail Gumerov University of Maryland Institute for Advanced Computer Studies Joint work with Ramani Duraiswami February Fourier Talks 2007
More informationNUFFT for Medical and Subsurface Image Reconstruction
NUFFT for Medical and Subsurface Image Reconstruction Qing H. Liu Department of Electrical and Computer Engineering Duke University Duke Frontiers 2006 May 16, 2006 Acknowledgment Jiayu Song main contributor
More informationParallel Implementation of 3D FMA using MPI
Parallel Implementation of 3D FMA using MPI Eric Jui-Lin Lu y and Daniel I. Okunbor z Computer Science Department University of Missouri - Rolla Rolla, MO 65401 Abstract The simulation of N-body system
More informationAccelerating nonuniform fast Fourier transform via reduction in memory access latency
Accelerating nonuniform fast Fourier transform via reduction in memory access latency Nihshanka Debroy a, Nikos P. Pitsianis ab and Xiaobai Sun a a Department of Computer Science, Duke University, Durham,
More informationInterpolation error in DNS simulations of turbulence: consequences for particle tracking
Journal of Physics: Conference Series Interpolation error in DNS simulations of turbulence: consequences for particle tracking To cite this article: M A T van Hinsberg et al 2011 J. Phys.: Conf. Ser. 318
More informationFAST WIDEBAND NEAR-FIELD IMAGING USING THE NON-EQUISPACED FFT WITH APPLICATION TO THROUGH-WALL RADAR
19th European Signal Processing Conference (EUSIPCO 2011) Barcelona, Spain, August 29 - September 2, 2011 FAST WIDEBAND NEAR-FIELD IMAGING USING THE NON-EQUISPACED FFT WITH APPLICATION TO THROUGH-WALL
More informationRAPID COMPUTATION OF THE DISCRETE FOURIER TRANSFORM*
SIAM J. ScI. COMPUT. Vol. 17, No. 4, pp. 913-919, July 1996 1996 Society for Industrial and Applied Mathematics O08 RAPID COMPUTATION OF THE DISCRETE FOURIER TRANSFORM* CHRIS ANDERSON AND MARIE DILLON
More informationLaplace Exercise Solution Review
Laplace Exercise Solution Review John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2017 Finished? If you have finished, we can review a few principles that you have inevitably
More informationThe Case for Collective Pattern Specification
The Case for Collective Pattern Specification Torsten Hoefler, Jeremiah Willcock, ArunChauhan, and Andrew Lumsdaine Advances in Message Passing, Toronto, ON, June 2010 Motivation and Main Theses Message
More informationA4. Intro to Parallel Computing
Self-Consistent Simulations of Beam and Plasma Systems Steven M. Lund, Jean-Luc Vay, Rémi Lehe and Daniel Winklehner Colorado State U., Ft. Collins, CO, 13-17 June, 2016 A4. Intro to Parallel Computing
More informationA Scalable Adaptive Mesh Refinement Framework For Parallel Astrophysics Applications
A Scalable Adaptive Mesh Refinement Framework For Parallel Astrophysics Applications James Bordner, Michael L. Norman San Diego Supercomputer Center University of California, San Diego 15th SIAM Conference
More informationLandscape Ecology. Lab 2: Indices of Landscape Pattern
Introduction In this lab exercise we explore some metrics commonly used to summarize landscape pattern. You will begin with a presettlement landscape entirely covered in forest. You will then develop this
More informationAdaptive Matrix Transpose Algorithms for Distributed Multicore Processors
Adaptive Matrix Transpose Algorithms for Distributed Multicore ors John C. Bowman and Malcolm Roberts Abstract An adaptive parallel matrix transpose algorithm optimized for distributed multicore architectures
More informationHigh performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli
High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering
More informationOptimized Least-Square Nonuniform Fast Fourier Transform Mathews Jacob, Member, IEEE
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 57, NO. 6, JUNE 2009 2165 Optimized Least-Square Nonuniform Fast Fourier Transform Mathews Jacob, Member, IEEE Abstract The main focus of this paper is to derive
More informationImplementation and evaluation of 3D FFT parallel algorithms based on software component model
Master 2 - Visualisation Image Performance University of Orléans (2013-2014) Implementation and evaluation of 3D FFT parallel algorithms based on software component model Jérôme RICHARD October 7th 2014
More informationA Clifford Fourier Transform for Vector Field Analysis and Visualization
Computational Fluid Dynamics JOURNAL vol.?? no.? June 2006 (pp. ) A Clifford Fourier Transform for Vector Field Analysis and Visualization Michael Schlemmer Ingrid Hotz Vijay Natarajan Bernd Hamann Hans
More informationAdaptive Transpose Algorithms for Distributed Multicore Processors
Adaptive Transpose Algorithms for Distributed Multicore Processors John C. Bowman and Malcolm Roberts University of Alberta and Université de Strasbourg April 15, 2016 www.math.ualberta.ca/ bowman/talks
More informationCSC630/CSC730 Parallel & Distributed Computing
CSC630/CSC730 Parallel & Distributed Computing Analytical Modeling of Parallel Programs Chapter 5 1 Contents Sources of Parallel Overhead Performance Metrics Granularity and Data Mapping Scalability 2
More informationCAF versus MPI Applicability of Coarray Fortran to a Flow Solver
CAF versus MPI Applicability of Coarray Fortran to a Flow Solver Manuel Hasert, Harald Klimach, Sabine Roller m.hasert@grs-sim.de Applied Supercomputing in Engineering Motivation We develop several CFD
More informationAUTOMATIC PARALLEL CODE GENERATION FOR NUFFT DATA TRANSLATION ON MULTICORES
Journal of Circuits, Systems, and Computers Vol. 2, No. 2 (202) 2 #.c World Scienti c Publishing Company DOI: 0.42/S02826620088 2 2 2 2 2 AUTOMATIC PARALLEL CODE GENERATION FOR NUFFT DATA TRANSLATION ON
More informationPredictive Indexing for Fast Search
Predictive Indexing for Fast Search Sharad Goel, John Langford and Alex Strehl Yahoo! Research, New York Modern Massive Data Sets (MMDS) June 25, 2008 Goel, Langford & Strehl (Yahoo! Research) Predictive
More informationNuts & Bolts of Advanced Imaging. Image Reconstruction Parallel Imaging
Nuts & Bolts of Advanced Imaging Image Reconstruction Parallel Imaging Michael S. Hansen, PhD Magnetic Resonance Technology Program National Institutes of Health, NHLBI Declaration of Financial Interests
More informationFourier transforms and convolution
Fourier transforms and convolution (without the agonizing pain) CS/CME/BioE/Biophys/BMI 279 Oct. 26, 2017 Ron Dror 1 Why do we care? Fourier transforms Outline Writing functions as sums of sinusoids The
More informationIntro to Parallel Computing
Outline Intro to Parallel Computing Remi Lehe Lawrence Berkeley National Laboratory Modern parallel architectures Parallelization between nodes: MPI Parallelization within one node: OpenMP Why use parallel
More informationScalable GPU Graph Traversal!
Scalable GPU Graph Traversal Duane Merrill, Michael Garland, and Andrew Grimshaw PPoPP '12 Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming Benwen Zhang
More informationBenchmark runs of pcmalib on Nehalem and Shanghai nodes
MOSAIC group Institute of Theoretical Computer Science Department of Computer Science Benchmark runs of pcmalib on Nehalem and Shanghai nodes Christian Lorenz Müller, April 9 Addresses: Institute for Theoretical
More informationGAUSSIAN convolution filters are frequently used tools in
3502 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 11, NOVEMBER 2006 An Optimal Nonorthogonal Separation of the Anisotropic Gaussian Convolution Filter Christoph H. Lampert and Oliver Wirjadi, Student
More informationPARALLEL SIMULATION OF A FLUID FLOW BY MEANS OF THE SPH METHOD: OPENMP VS. MPI COMPARISON. Pawe l Wróblewski, Krzysztof Boryczko
Computing and Informatics, Vol. 28, 2009, 139 150 PARALLEL SIMULATION OF A FLUID FLOW BY MEANS OF THE SPH METHOD: OPENMP VS. MPI COMPARISON Pawe l Wróblewski, Krzysztof Boryczko Department of Computer
More informationData parallel algorithms 1
Data parallel algorithms (Guy Steele): The data-parallel programming style is an approach to organizing programs suitable for execution on massively parallel computers. In this lecture, we will characterize
More informationHigh Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters
SIAM PP 2014 High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters C. Riesinger, A. Bakhtiari, M. Schreiber Technische Universität München February 20, 2014
More informationImage processing in frequency Domain
Image processing in frequency Domain Introduction to Frequency Domain Deal with images in: -Spatial domain -Frequency domain Frequency Domain In the frequency or Fourier domain, the value and location
More informationA Deterministic Fault-Tolerant and Deadlock-Free Routing Protocol in 2-D Meshes Based on Odd-Even Turn Model
A Deterministic Fault-Tolerant and Deadlock-Free Routing Protocol in 2-D Meshes Based on Odd-Even Turn Model Jie Wu Dept. of Computer Science and Engineering Florida Atlantic University Boca Raton, FL
More informationThe Barnes-Hut Algorithm in MapReduce
The Barnes-Hut Algorithm in MapReduce Ross Adelman radelman@gmail.com 1. INTRODUCTION For my end-of-semester project, I implemented an N-body solver in MapReduce using Hadoop. The N-body problem is a classical
More informationA Graph Theoretic Approach to Image Database Retrieval
A Graph Theoretic Approach to Image Database Retrieval Selim Aksoy and Robert M. Haralick Intelligent Systems Laboratory Department of Electrical Engineering University of Washington, Seattle, WA 98195-2500
More informationSpectral Sidelobe Suppression of Nonuniform Pulse Repetition Frequency Waveforms in Strong Clutter
Spectral Sidelobe Suppression of Nonuniform Pulse Repetition Frequency Waveforms in Strong Clutter Sandun Kodituwakku, Van Khanh Nguyen, Mike D. Turley National Security and ISR Division Defence Science
More information18.S34 (FALL 2007) PROBLEMS ON HIDDEN INDEPENDENCE AND UNIFORMITY
18.S34 (FALL 2007) PROBLEMS ON HIDDEN INDEPENDENCE AND UNIFORMITY All the problems below (with the possible exception of the last one), when looked at the right way, can be solved by elegant arguments
More informationPartitioning and Divide-and-Conquer Strategies
Chapter 4 Partitioning and Divide-and-Conquer Strategies Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen,
More informationLaplace Exercise Solution Review
Laplace Exercise Solution Review John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2018 Finished? If you have finished, we can review a few principles that you have inevitably
More informationAn Analysis of FFT Performance in PRACE Application Codes
An Analysis of FFT Performance in PRACE Application Codes Andrew Sunderland a, Stephen Pickles a, Miloš Nikolić b, Aleksandar Jović b, Josip Jakić b, Vladimir Slavnić b, Ivan Girotto c, Peter Nash c, Michael
More informationMPI Casestudy: Parallel Image Processing
MPI Casestudy: Parallel Image Processing David Henty 1 Introduction The aim of this exercise is to write a complete MPI parallel program that does a very basic form of image processing. We will start by
More informationSwitch Jitter. John Hague IBM consultant Nov/08
Switch Jitter John Hague IBM consultant Nov/8 Introduction Investigate Halo exchange time One of simplest communication patterns Expect increase with number of MPI tasks Will not identity cause of jitter
More informationCONTENT ADAPTIVE SCREEN IMAGE SCALING
CONTENT ADAPTIVE SCREEN IMAGE SCALING Yao Zhai (*), Qifei Wang, Yan Lu, Shipeng Li University of Science and Technology of China, Hefei, Anhui, 37, China Microsoft Research, Beijing, 8, China ABSTRACT
More informationFFT, FMM, OR MULTIGRID? A COMPARATIVE STUDY OF STATE-OF-THE-ART POISSON SOLVERS FOR UNIFORM AND NON-UNIFORM GRIDS IN THE UNIT CUBE
FFT, FMM, OR MULTIGRID? A COMPARATIVE STUDY OF STATE-OF-THE-ART POISSON SOLVERS FOR UNIFORM AND NON-UNIFORM GRIDS IN THE UNIT CUBE AMIR GHOLAMI, DHAIRYA MALHOTRA, HARI SUNDAR, AND GEORGE BIROS Abstract.
More informationLoad Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application
Load Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application Esteban Meneses Patrick Pisciuneri Center for Simulation and Modeling (SaM) University of Pittsburgh University of
More informationA Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004
A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into
More informationLARGE-EDDY EDDY SIMULATION CODE FOR CITY SCALE ENVIRONMENTS
ARCHER ECSE 05-14 LARGE-EDDY EDDY SIMULATION CODE FOR CITY SCALE ENVIRONMENTS TECHNICAL REPORT Vladimír Fuka, Zheng-Tong Xie Abstract The atmospheric large eddy simulation code ELMM (Extended Large-eddy
More informationFORSCHUNGSZENTRUM JÜLICH GmbH Jülich Supercomputing Centre D Jülich, Tel. (02461)
FORSCHUNGSZENTRUM JÜLICH GmbH Jülich Supercomputing Centre D-52425 Jülich, Tel. (02461) 61-6402 Technical Report Benchmark of fast Coulomb Solvers for open and periodic boundary conditions Sebastian Krumscheid
More informationSpline Curves. Spline Curves. Prof. Dr. Hans Hagen Algorithmic Geometry WS 2013/2014 1
Spline Curves Prof. Dr. Hans Hagen Algorithmic Geometry WS 2013/2014 1 Problem: In the previous chapter, we have seen that interpolating polynomials, especially those of high degree, tend to produce strong
More informationCode Parallelization
Code Parallelization a guided walk-through m.cestari@cineca.it f.salvadore@cineca.it Summer School ed. 2015 Code Parallelization two stages to write a parallel code problem domain algorithm program domain
More informationParallel FFT Libraries
Parallel FFT Libraries Evangelos Brachos August 19, 2011 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2011 Abstract The focus of this project is the area of the fast
More informationMulticore ZPL. Steven P. Smith. A senior thesis submitted in partial fulfillment of. the requirements for the degree of
Multicore ZPL By Steven P. Smith A senior thesis submitted in partial fulfillment of the requirements for the degree of Bachelor of Science With Departmental Honors Computer Science & Engineering University
More informationDigital Signal Processing. Soma Biswas
Digital Signal Processing Soma Biswas 2017 Partial credit for slides: Dr. Manojit Pramanik Outline What is FFT? Types of FFT covered in this lecture Decimation in Time (DIT) Decimation in Frequency (DIF)
More informationMaximizing the Spread of Influence through a Social Network
Maximizing the Spread of Influence through a Social Network By David Kempe, Jon Kleinberg, Eva Tardos Report by Joe Abrams Social Networks Infectious disease networks Viral Marketing Viral Marketing Example:
More informationA Random Variable Shape Parameter Strategy for Radial Basis Function Approximation Methods
A Random Variable Shape Parameter Strategy for Radial Basis Function Approximation Methods Scott A. Sarra, Derek Sturgill Marshall University, Department of Mathematics, One John Marshall Drive, Huntington
More informationKey words. Poisson Solvers, Fast Fourier Transform, Fast Multipole Method, Multigrid, Parallel Computing, Exascale algorithms, Co-Design
FFT, FMM, OR MULTIGRID? A COMPARATIVE STUDY OF STATE-OF-THE-ART POISSON SOLVERS AMIR GHOLAMI, DHAIRYA MALHOTRA, HARI SUNDAR,, AND GEORGE BIROS Abstract. We discuss the fast solution of the Poisson problem
More informationPARALLEL METHODS FOR SOLVING PARTIAL DIFFERENTIAL EQUATIONS. Ioana Chiorean
5 Kragujevac J. Math. 25 (2003) 5 18. PARALLEL METHODS FOR SOLVING PARTIAL DIFFERENTIAL EQUATIONS Ioana Chiorean Babeş-Bolyai University, Department of Mathematics, Cluj-Napoca, Romania (Received May 28,
More informationGPR Migration Imaging Algorithm Based on NUFFT
PIERS ONLINE, VOL. 6, NO. 1, 010 16 GPR Migration Imaging Algorithm Based on NUFFT Hao Chen, Renbiao Wu, Jiaxue Liu, and Zhiyong Han Tianjin Key Laboratory for Advanced Signal Processing, Civil Aviation
More informationAdaptive Matrix Transpose Algorithms for Distributed Multicore Processors
Adaptive Matrix Transpose Algorithms for Distributed Multicore ors John C. Bowman and Malcolm Roberts Abstract An adaptive parallel matrix transpose algorithm optimized for distributed multicore architectures
More informationLarge Scale Parallel Lattice Boltzmann Model of Dendritic Growth
Large Scale Parallel Lattice Boltzmann Model of Dendritic Growth Bohumir Jelinek Mohsen Eshraghi Sergio Felicelli CAVS, Mississippi State University March 3-7, 2013 San Antonio, Texas US Army Corps of
More informationContents. Implementing the QR factorization The algebraic eigenvalue problem. Applied Linear Algebra in Geoscience Using MATLAB
Applied Linear Algebra in Geoscience Using MATLAB Contents Getting Started Creating Arrays Mathematical Operations with Arrays Using Script Files and Managing Data Two-Dimensional Plots Programming in
More informationPartial Wave Analysis using Graphics Cards
Partial Wave Analysis using Graphics Cards Niklaus Berger IHEP Beijing Hadron 2011, München The (computational) problem with partial wave analysis n rec * * i=1 * 1 Ngen MC NMC * i=1 A complex calculation
More informationDiscovery of the Source of Contaminant Release
Discovery of the Source of Contaminant Release Devina Sanjaya 1 Henry Qin Introduction Computer ability to model contaminant release events and predict the source of release in real time is crucial in
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationMean square optimal NUFFT approximation for efficient non-cartesian MRI reconstruction
Mean square optimal NUFFT approimation for efficient non-cartesian MRI reconstruction Zhili Yang and Mathews Jacob Abstract The fast evaluation of the discrete Fourier transform of an image at non-uniform
More informationChapter 2 Basic Structure of High-Dimensional Spaces
Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,
More informationExploiting Depth Camera for 3D Spatial Relationship Interpretation
Exploiting Depth Camera for 3D Spatial Relationship Interpretation Jun Ye Kien A. Hua Data Systems Group, University of Central Florida Mar 1, 2013 Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships
More informationHigh-Resolution 3-D Radar Imaging through Nonuniform Fast Fourier Transform (NUFFT)
COMMUNICATIONS IN COMPUTATIONAL PHYSICS Vol. 1, No. 1, pp. 176-191 Commun. Comput. Phys. February 2006 High-Resolution 3-D Radar Imaging through Nonuniform Fast Fourier Transform (NUFFT) Jiayu Song 1,
More informationHigh-Performance and Scalable Non-Blocking All-to-All with Collective Offload on InfiniBand Clusters: A study with Parallel 3DFFT
High-Performance and Scalable Non-Blocking All-to-All with Collective Offload on InfiniBand Clusters: A study with Parallel 3DFFT Krishna Kandalla (1), Hari Subramoni (1), Karen Tomko (2), Dmitry Pekurovsky
More informationFeature Descriptors. CS 510 Lecture #21 April 29 th, 2013
Feature Descriptors CS 510 Lecture #21 April 29 th, 2013 Programming Assignment #4 Due two weeks from today Any questions? How is it going? Where are we? We have two umbrella schemes for object recognition
More informationExamination in Image Processing
Umeå University, TFE Ulrik Söderström 203-03-27 Examination in Image Processing Time for examination: 4.00 20.00 Please try to extend the answers as much as possible. Do not answer in a single sentence.
More informationAn Efficient Boundary Integral Scheme for the Threshold Dynamics Method II: Applications to Wetting Dynamics
Noname manuscript No. (will be inserted by the editor) An Efficient Boundary Integral Scheme for the Threshold Dynamics Method II: Applications to Wetting Dynamics Dong Wang Shidong Jiang Xiao-Ping Wang
More informationHPC Fall 2010 Final Project 3 2D Steady-State Heat Distribution with MPI
HPC Fall 2010 Final Project 3 2D Steady-State Heat Distribution with MPI Robert van Engelen Due date: December 10, 2010 1 Introduction 1.1 HPC Account Setup and Login Procedure Same as in Project 1. 1.2
More informationSteen Moeller Center for Magnetic Resonance research University of Minnesota
Steen Moeller Center for Magnetic Resonance research University of Minnesota moeller@cmrr.umn.edu Lot of material is from a talk by Douglas C. Noll Department of Biomedical Engineering Functional MRI Laboratory
More informationFast Spherical Filtering in the Broadband FMBEM using a nonequally
Fast Spherical Filtering in the Broadband FMBEM using a nonequally spaced FFT Daniel R. Wilkes (1) and Alec. J. Duncan (1) (1) Centre for Marine Science and Technology, Department of Imaging and Applied
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationSupplementary Material for The Generalized PatchMatch Correspondence Algorithm
Supplementary Material for The Generalized PatchMatch Correspondence Algorithm Connelly Barnes 1, Eli Shechtman 2, Dan B Goldman 2, Adam Finkelstein 1 1 Princeton University, 2 Adobe Systems 1 Overview
More informationComputational Aspects of MRI
David Atkinson Philip Batchelor David Larkman Programme 09:30 11:00 Fourier, sampling, gridding, interpolation. Matrices and Linear Algebra 11:30 13:00 MRI Lunch (not provided) 14:00 15:30 SVD, eigenvalues.
More informationIntroduction to Parallel Computing!
Introduction to Parallel Computing! SDSC Summer Institute! August 6-10, 2012 San Diego, CA! Rick Wagner! HPC Systems Manager! Purpose, Goals, Outline, etc.! Introduce broad concepts " Define terms " Explore
More informationScalable Dynamic Load Balancing of Detailed Cloud Physics with FD4
Center for Information Services and High Performance Computing (ZIH) Scalable Dynamic Load Balancing of Detailed Cloud Physics with FD4 Minisymposium on Advances in Numerics and Physical Modeling for Geophysical
More informationMRF-based Algorithms for Segmentation of SAR Images
This paper originally appeared in the Proceedings of the 998 International Conference on Image Processing, v. 3, pp. 770-774, IEEE, Chicago, (998) MRF-based Algorithms for Segmentation of SAR Images Robert
More informationRadix-4 FFT Algorithms *
OpenStax-CNX module: m107 1 Radix-4 FFT Algorithms * Douglas L Jones This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 10 The radix-4 decimation-in-time
More informationHomework #4 Due Friday 10/27/06 at 5pm
CSE 160, Fall 2006 University of California, San Diego Homework #4 Due Friday 10/27/06 at 5pm 1. Interconnect. A k-ary d-cube is an interconnection network with k d nodes, and is a generalization of the
More informationParallelization of DQMC Simulations for Strongly Correlated Electron Systems
Parallelization of DQMC Simulations for Strongly Correlated Electron Systems Che-Rung Lee Dept. of Computer Science National Tsing-Hua University Taiwan joint work with I-Hsin Chung (IBM Research), Zhaojun
More informationNonuniform Fast Fourier Transforms Using Min-Max Interpolation
560 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 2, FEBRUARY 2003 Nonuniform Fast Fourier Transforms Using Min-Max Interpolation Jeffrey A. Fessler, Senior Member, IEEE, and Bradley P. Sutton,
More informationParallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved.
Parallel Systems Prof. James L. Frankel Harvard University Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved. Architectures SISD (Single Instruction, Single Data)
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationAliasing and Antialiasing. ITCS 4120/ Aliasing and Antialiasing
Aliasing and Antialiasing ITCS 4120/5120 1 Aliasing and Antialiasing What is Aliasing? Errors and Artifacts arising during rendering, due to the conversion from a continuously defined illumination field
More informationPicture quality requirements and NUT proposals for JPEG AIC
Mar. 2006, Cupertino Picture quality requirements and NUT proposals for JPEG AIC Jae-Jeong Hwang, Young Huh, Dai-Gyoung Kim Kunsan National Univ., KERI, Hanyang Univ. hwang@kunsan.ac.kr Contents 1. Picture
More informationEmpirical Analysis of Space Filling Curves for Scientific Computing Applications
Empirical Analysis of Space Filling Curves for Scientific Computing Applications Daryl DeFord 1 Ananth Kalyanaraman 2 1 Dartmouth College Department of Mathematics 2 Washington State University School
More informationIntermediate Parallel Programming & Cluster Computing
High Performance Computing Modernization Program (HPCMP) Summer 2011 Puerto Rico Workshop on Intermediate Parallel Programming & Cluster Computing in conjunction with the National Computational Science
More informationELGIN ACADEMY Mathematics Department Evaluation Booklet (Main) Name Reg
ELGIN ACADEMY Mathematics Department Evaluation Booklet (Main) Name Reg CfEM You should be able to use this evaluation booklet to help chart your progress in the Maths department from August in S1 until
More informationSpatial Interpolation & Geostatistics
(Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Lag Mean Distance between pairs of points 11/3/2016 GEO327G/386G, UT Austin 1 Tobler s Law All places are related, but nearby places are related
More informationHigh Performance Computing. Introduction to Parallel Computing
High Performance Computing Introduction to Parallel Computing Acknowledgements Content of the following presentation is borrowed from The Lawrence Livermore National Laboratory https://hpc.llnl.gov/training/tutorials
More informationSpatial Interpolation & Geostatistics
(Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Lag Mean Distance between pairs of points 1 Tobler s Law All places are related, but nearby places are related more than distant places Corollary:
More information