QUDA Programming For Staggered Quarks

Size: px

Start display at page:

Download "QUDA Programming For Staggered Quarks"

Anis Simmons
5 years ago
Views:

1 QUDA Programming For Staggered Quarks Steven Gottlieb, Guochun Shi, Aaron Torok, Volodymyr Kindratenko National Center for Supercomputing Applications & Indiana University 1

2 Outline Background New staggered routines Benchmarks Production experience Future 2

3 Background NCSA s Innovative Systems Laboratory Worked on a Cell B.E. port of MILC, arxiv: Boston University: QUDA for Wilson type quarks, arxiv: , arxiv: /2009: sabbatical at NCSA We extend QUDA to staggered quarks 3

4 New Staggered Code First effort was staggered dslash for single GPU Extended to CG and then mulitmass CG Fat link computation Gauge Force Fermion Force Next step was to merge into BU QUDA code which had evolved from initial release 4

5 MultiGPU code MultiGPU inverter is now working can overlap communication with computation by using interior and exterior kernels or use a single kernel that is launched after message from neighbor is transferred to GPU 5

6 Differences between Wilson and staggered code: for improved staggered quarks need both fat links and long links (more memory) fat links are not unitary, so reconstruction not used (more memory, more time) for multigpu need to fetch from three planes of neighboring node 6

7 Benchmarks Guochun please check!! several systems available for benchmarking: GTX 280: on AC cluster at NCSA Tesla: FNAL, NERSC, JLab Fermi: NERSC 7

8 reconstruct CG(GF/s) multimass CG (GF/s) DP SP HP ^3 32 on GTX280. Four masses for last column. 8

9 standalone goal:assuming 100GB/s with PCIe overhead CG MM-CG Fat link Gauge Force Fermion Force All values in single precision GF/s. CG speeds measured for 500 iterations. 24^3 32 lattice. Use 12-reconstruct when possible. 9

10 Production Experience Have been using GPUs for electromagnetic effects, i.e., SU(3) U(1) So far only using ensembles that fit in single GPU Have analyzed several thousand configurations from 20^3 64 to 24^3 96 AC (NCSA), FNAL, Dirac (NERSC), JLab 10

11 Future Study of heavy-light mesons might use GPUs. Need to combine both Clover and staggered inverters. Although we have asqtad code modules for gauge configuration generation, now generating HISQ configurations, so new code must be added. Investigate strong scaling as supercomputers reaching for petaflop/s performance. Essential to decide what parts of production running can be profitably shifted to GPUs 11

12 Backup after this Following slides were shown in Tucson 12

13 8/17: I arrive at NCSA for sabbatical 8/19: Guochun Shi and I attend JLAB GPU workshop 8/31: Guochun compiles BU code on NCSA s AC cluster; I compile on my laptop early Sept.: start weekly GPU meeting 9/30: Guochun has staggered dslash 13

14 10/18: Guochun is playing with BU options like half precision and different memory arrangements 10/20: using texture memory helps performance 11/2: struggling with getting phases right in fat link reconstruction 11/10: Guochun reports that CG works for asqtad 11/17: multimass inverter working 14

15 11/19: join JLAB GPU mailing list. Goal is a multigpu API 12/7: Guochun working on fat link computation 1/11/10: fat link running at 156 GF 1/12/10: start discussion of gauge force calculation at weekly meeting 15

16 Current Performance single precision dslash: 85 GF without any texture reads multimass inverter (SP): 75 GF fat link: 156 GF 16

17 Next Steps Guochun is working on gauge force We are waiting for JLAB progress on multigpu API I want to try to use GPU inverters and fatlink routines for superscript 17

18 Additional Information NCSA GPU code is available via subversion from kfs3.ncsa.uiuc.edu/projects/ncsa_projects/ look in quda/quda, quda/staggered_quda, milc_qcd for BU code 18

arxiv: v1 [hep-lat] 1 Dec 2017

arxiv: v1 [hep-lat] 1 Dec 2017 arxiv:1712.00143v1 [hep-lat] 1 Dec 2017 MILC Code Performance on High End CPU and GPU Supercomputer Clusters Carleton DeTar 1, Steven Gottlieb 2,, Ruizi Li 2,, and Doug Toussaint 3 1 Department of Physics