Partial Wave Analysis using Graphics Cards

Similar documents
Partial Wave Analysis at BES III harnessing the power of GPUs

Partial Wave Analysis at BES III harnessing the power of GPUs. Niklaus Berger IHEP Beijing INT Seattle, November 2009

General Purpose GPU Computing in Partial Wave Analysis

Partial wave analysis at BES III harnessing the power of GPUs

PARALLEL FRAMEWORK FOR PARTIAL WAVE ANALYSIS AT BES-III EXPERIMENT

ComPWA: A common amplitude analysis framework for PANDA

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

Hall D and IT. at Internal Review of IT in the 12 GeV Era. Mark M. Ito. May 20, Hall D. Hall D and IT. M. Ito. Introduction.

GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS

Generators at the LHC

GPGPU. Peter Laurens 1st-year PhD Student, NSC

GPU Linear algebra extensions for GNU/Octave

Performance potential for simulating spin models on GPU

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

Using Graphics Chips for General Purpose Computation

The Use of Cloud Computing Resources in an HPC Environment

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

Introduction to GPU hardware and to CUDA

designing a GPU Computing Solution

N-Body Simulation using CUDA. CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo

PyPWA A Partial-Wave/Amplitude Analysis Software Framework

Chapter 3 Parallel Software

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters

Approaches to acceleration: GPUs vs Intel MIC. Fabio AFFINITO SCAI department

GPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP

Current Trends in Computer Graphics Hardware

Highly Scalable Multi-Objective Test Suite Minimisation Using Graphics Card

GPU Programming Using NVIDIA CUDA

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

Modern Processor Architectures. L25: Modern Compiler Design

Splotch: High Performance Visualization using MPI, OpenMP and CUDA

GPGPU, 4th Meeting Mordechai Butrashvily, CEO GASS Company for Advanced Supercomputing Solutions

High Quality DXT Compression using OpenCL for CUDA. Ignacio Castaño

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

Use cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games

Addressing Heterogeneity in Manycore Applications

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs

Summary of CERN Workshop on Future Challenges in Tracking and Trigger Concepts. Abdeslem DJAOUI PPD Rutherford Appleton Laboratory

Experts in Application Acceleration Synective Labs AB

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

Optimising the Mantevo benchmark suite for multi- and many-core architectures

SATGPU - A Step Change in Model Runtimes

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Administrivia. Minute Essay From 4/11

The Many-Core Revolution Understanding Change. Alejandro Cabrera January 29, 2009

Efficient and Scalable Shading for Many Lights

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Parallelism. Parallel Hardware. Introduction to Computer Systems

Accelerating image registration on GPUs

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs

CMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN

Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU

GPUs and Emerging Architectures

Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics

GPU 101. Mike Bailey. Oregon State University. Oregon State University. Computer Graphics gpu101.pptx. mjb April 23, 2017

GPU 101. Mike Bailey. Oregon State University

GPGPU Applications. for Hydrological and Atmospheric Simulations. and Visualizations on the Web. Ibrahim Demir

High-Performance Reservoir Simulations on Modern CPU-GPU Computational Platforms Abstract Introduction

Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany

HPC future trends from a science perspective

7 DAYS AND 8 NIGHTS WITH THE CARMA DEV KIT

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Introduction to CUDA programming

Trends in HPC (hardware complexity and software challenges)

Accelerating CFD with Graphics Hardware

HPC Architectures. Types of resource currently in use

High Performance Computing (HPC) Introduction

Technology for a better society. hetcomp.com

GooFit: Massively parallel function evaluation

Distributed Virtual Reality Computation

CS 179: GPU Programming

Allinea Unified Environment

Georgia Institute of Technology, August 17, Justin W. L. Wan. Canada Research Chair in Scientific Computing

Enhancing Analysis-Based Design with Quad-Core Intel Xeon Processor-Based Workstations

Particle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA

arxiv: v1 [physics.ins-det] 11 Jul 2015

Striped Data Server for Scalable Parallel Data Analysis

Turbostream: A CFD solver for manycore

Speeding up MATLAB Applications Sean de Wolski Application Engineer

LATTICE-BOLTZMANN AND COMPUTATIONAL FLUID DYNAMICS

Overview of High Performance Computing

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS

FMM implementation on CPU and GPU. Nail A. Gumerov (Lecture for CMSC 828E)

Programmable Graphics Hardware (GPU) A Primer

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

Trends and Challenges in Multicore Programming

ATS-GPU Real Time Signal Processing Software

Living Image. Release Notes Version 4.7

viewdle! - machine vision experts

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS

DIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka

Interpreting a genetic programming population on an nvidia Tesla

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Could you make the XNA functions yourself?

Domain Decomposition: Computational Fluid Dynamics

W H I T E P A P E R U n l o c k i n g t h e P o w e r o f F l a s h w i t h t h e M C x - E n a b l e d N e x t - G e n e r a t i o n V N X

Two-Phase flows on massively parallel multi-gpu clusters

Transcription:

Partial Wave Analysis using Graphics Cards Niklaus Berger IHEP Beijing Hadron 2011, München

The (computational) problem with partial wave analysis n rec * * i=1 * 1 Ngen MC NMC * i=1 A complex calculation (repeated many times over) + = something potentially very slow lots of statistics at Babar, Belle, BES III, Compass, GlueX, Panda etc. 2 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

Four years ago... I moved to IHEP Beijing All I remembered about partial waves was an unpleasant theory exam People at IHEP were worried about a 100 increase in statistics I did not know about partial waves, but new how to do things fast I happened to have just read a magazine article about computing on graphics processors Photo: Andreas Rodler 3 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

Partial Wave Analysis as a Computational Problem Splits into subtasks: Building a model Determining model parameters through a fit to the data Judge fit results Iterate until satisfied Tightly coupled with the physicist: look at plots, adjust model and input parameters 4 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

From Model to Likelihood Decay amplitudes: Resonance and angular structure Sum over partial waves 5 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

From Model to Likelihood Decay amplitudes: Resonance and angular structure Sum over partial waves n i=1 Normalisation integral over phase space Product over data events 6 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

From Model to Likelihood n i=1 Normalisation integral over phase space Product over data events Log likelihood n i=1 * * rec N MC * 1 gen N MC i=1 * Sum over data events Sum over partial waves 7 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

{ From Model to Likelihood: Fixed Amplitudes n i=1 Normalisation integral over phase space Product over data events Log likelihood n i=1 { * * 2 rec { N MC * 1 gen N MC i=1 * Sum over data events Sum over partial waves Computationally intensive: O (N iteration N event N wave ) 2 8 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

Going parallel! Almost all our hardware is now parallel Almost all our software is not Almost all our problems are trivially parallel (events!) The solution to speed problems is obvious... 9 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

How to do parallel? Farm/Cluster Lots of power Some inter-process communication Long latency (Network & Scheduling) Grid Almost infinite power Very limited inter-process communication Very long latency Multi-core CPU Finite power Very fast inter-process communication Almost no latency Graphics Processor Short latency Almost infinite floating-point power Fast communication with CPU 10 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

Parallel PWA PWA is embarassingly parallel: Exactly the same (relatively simple) calculation for each event Every event has its own data, only fit parameters are shared Use parallel hardware and make use of Single Instruction - Multiple Data (SIMD) capabilities Very strong here: Graphics processors (GPUs): Cheap and powerful hardware 11 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

Accessing the Power of GPUs Programming for the GPU is less straightforward than for the CPU Early days: Use graphics interface (OpenGL) - translate problem to drawing a picture Vendor low-level frameworks: Nvidida CUDA and ATI CAL Vendor higher level framework: Brook+ Independent commercial software: RapidMind Emerging standard: OpenCL 12 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

ATI Brook+ We started with using ATI Brook+ Was the first to provide double precision Hardware with best performance/ price Very clean programming model, narrow interface Had all of the early adopter problems Lots of bugs and limitations Small user base Mediocre support Uncertain future Now discontinued by AMD/ATI, we switched to OpenCL 13 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

OpenCL OpenCL is a vendor- and hardware independent standard for parallel computing (in principle...) Gives you lots of detailed control and optimization options...... at the cost of a very low level, hardware driver like interface No type safety, optimization depends on machine type For embarrassingly parallel tasks: use some higher level abstraction 14 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

GPUPWA at BES III GPUPWA is our running framework Just done transition to OpenCL GPU based tensor manipulation Management of partial waves GPU based normalisation integrals GPU based likelihoods GPU based analytic gradients Interface to ROOT::Minuit2 fitters Projections and plots using ROOT 30000 20000 10000 0 1.8 2 2.2 m(k + K - ) [GeV/c 2 ] See: http://gpupwa.sourceforge.net 15 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

Performance (Brook+) We use a toy model J/ψ γ K + K - analysis for all performance studies Time/Iteration 10 s 1 s 0.1 s FORTRAN GPUPWA Sums on CPU 0.01 s GPUPWA Sums on GPU 0 s 0 200000 400000 Using an Intel Core 2 Quad 2.4 GHz workstation with 2 GB of RAM and an ATI Radeon 4870 GPU with 512 MB of RAM for measurements Number of Events 150 Speedup 16 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

Performance (OpenCL) 0.06 0.05 Time/Iteration [s] 0.04 0.03 0.02 Brook+ OpenCL 0.01 0 100000 200000 500000 Events 17 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

Performance (CPU/GPU) 10.0 Fortran Time/Iteration [s] 1.0 0.1 0.01 OpenCL CPU Brook+ OpenCL 0.001 0 100000 200000 500000 Events 18 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

Indiana framework (Cleo-c, BES III and GlueX) Following a presentation by M. Shepherd; work done by M. Shepherd, R. Mitchell and H. Matevosyan, Indiana University Using a cluster with message passing interface (MPI) High-level inter-process communication; easy to code and debug Perform likelihood calculation in parallel; each node with a subset of data and MC Use Open MPI implementation of MPI2 (www.open-mpi.org) Scales well over multiple cores, with fast network also over small cluster Calculation on GPUs using Nvidias CUDA (also on a cluster) Need more than hundred-fold parallel tasks: amplitude calculation at event level Some cost for copying data to and from GPU Small fraction of code (large, expensive loops) ported to GPU Coding/debugging somewhat challenging 19 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

Speed benchmarks Tested with a γp π + π + π - n analysis with 5 π + π + π - resonances and one floating Breit-Wigner mass Amplitudes and log likelihoods are done on the GPU(s), the rest on the CPU(s) CPU parallelizaition handled by MPI Preliminary conclusions: MPI paralellization is efficient It is difficult to use the full power of GPUs 20 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

Multi-CPU scaling MPI allows very efficient parallelization of likelihood computation Only parameters and partial sums need to be exchanged between nodes User never needs to write MPI calls - all taken care of behind the scenes Fast and easy solution for multi-core systems 21 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

Compute-intensive amplitudes on the GPU Same fit with one change: Compute π in the Breit-Wigner using the first n terms of the arctan Taylor-expansion Now the fit time is dominated by the computational complexity of the amplitude More compute intensive amplitudes, i.e. more sophisticated models, are an excellent match for GPU accelerated fitting 22 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

AmpTools Independent of the experiment and the particular physics process the amplitude analysis fit (i.e. construction of the likelihood) is pretty much the same This suggests it is possible to write a general software package that does all the heavy lifting especially regarding parallel computing The user provides code for two types of C++ objects: A recipe for calculating amplitudes, e.g., Breit-Wigner function -- no built-in physics! A mechanism to read data into the framework The user specifies how many amplitudes, what types, arguments, free parameters, etc., via a configuration file (limits recompiling between fits) Library has been used/developed at Indiana U. over the past several years -- has provided a unified approach for several analyses the group is working on They are now trying to make available for general use: amptools.sourceforge.net (although, at this stage, documentation/examples are under development) 23 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

Speed is not the problem... We are fast enough, if we actually use our hardware This requires some work (which is however well invested...) This requires moving beyond FORTRAN (to some sort of C...) This will allow us to focus on the real problems... 24 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

Fitting in the dark... In partial wave analysis, we perform fits with 20 (40, 60, more...) free parameters We will never know, whether we found the global minimum We can tell if a wave-set is sufficient, but can we know it is right? Can we even judge the goodness of fit? ( Badness is easy...) We know that there must be multiple solutions... There is detector resolution On the technical side: Could we get minimisers working with complex numbers? Could we get more control over the minimizers? Could we get a high level language building on OpenCL? 25 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

Which results will be believed? However wrong the analysis, people will usually believe quantum numbers if there is a bump in the mass spectrum However right the analysis, people will usually not believe in a new resonance if there is no bump, especially if it is exotic 26 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München

Summary PWA profits from massively parallel computing on GPUs We have created a software framework to harness this power - speedups of two orders of magnitude User base at BES is growing, development continues OpenCL (and beyond) is the way to go Interesting work also ongoing at Indiana University - including multiple nodes via MPI and here in Munich PWA has fundamental problems because of fits with too(?) many free parameters With GlueX (JLAB) and PANDA (FAIR), big new PWA facilities are on the horizon what can we do? 27 Partial Wave Analysis on GPUs Niklaus Berger Hadron 2011 München