MERGE-BASED SpMV PERFECT WORKLOAD BALANCE. GUARANTEED. Duane Merrill, NVIDIA Research
|
|
- Donald Gibbs
- 6 years ago
- Views:
Transcription
1 MERGE-BASED SpMV PERFECT WORKLOAD BALANCE. GUARANTEED. Dun Mrrill, NVIDIA Rsr
2 SPARSE MATRIX-VECTOR MULTIPLICATION SpMV (Ax = y) * = sprs mtrix A ns vtor x ns vtor y 2
3 SPARSE MATRIX-VECTOR MULTIPLICATION Lots o vill prlllism * = ()() + ()() 0.0 ()() + ()() ()() + ()() + ()() +()() sprs mtrix A ns vtor x ns vtor y 3
4 PARALLEL DECOMPOSITION 4
5 COMPRESSED SPARSE ROW (CSR) FORMAT 3-rry rprsnttion vlus olumn inis A row osts
6 CSR PARALLEL DECOMPOSITION ) Row-s vlus olumn inis A row osts p 0 p 1 p 2 p
7 CSR PARALLEL DECOMPOSITION ) Row-s p 0 p 1 p 2 p 3 imln! vlus olumn inis A row osts p 0 p 1 p 2 p
8 CSR PARALLEL DECOMPOSITION ) Nonzro splittin p 0 p 1 p 2 p vlus olumn inis A row osts
9 CSR PARALLEL DECOMPOSITION ) Nonzro splittin p 0 p 1 p 2 p A vlus olumn inis row osts p 0 p 1 p 2 p imln! 9
10 CSR PARALLEL DECOMPOSITION ) Mr-s (loil) p 0 p 1 p vlus row osts olumn inis A 10
11 PERFORMANCE CONSISTENCY Consistny is r ttr tn rr momnts o rtnss -Sott Ginsr 11
12 Exmpls srmv() wit ~35M non-zros (K40, p64) trmom_k (tmprtur ormtion) nr-2000 (W onntivity) ASIC_320k (iruit simultion) usparse (row-s, vtoriz prlll omposition): 12.4 GFLOPS 5.9 GFLOPS 0.12 GFLOPS 12
13 Exmpls srmv() wit ~35M non-zros (K40, p64) trmom_k (tmprtur ormtion) nr-2000 (W onntivity) ASIC_320k (iruit simultion) usparse (row-s, vtoriz prlll omposition): Mr-s: 12.4 GFLOPS 5.9 GFLOPS 0.12 GFLOPS 15.5 GFLOPS 16.7 GFLOPS 14.1 GFLOPS 13
14 PERFORMANCE (IN)CONSISTENCY Prllliztion souln t introu rtits in t prormn lnsp Sours o t-pnnt prormn rtits: Contntion Worklo imln Exrt on mssivly prlll GPUs >60 spiliz GPU-spii SpMV loritms n sprs mtrix ormts! 14
15 Runtim (ms) Runtim (ms) SPMV PERFORMANCE LANDSCAPE T ntir Flori Sprs Mtrix Colltion: 4.2K tsts (K40, p64 CsrMV) Mtris y siz usparse Mtris y siz Mr-s 15
16 Runtim (ms) Runtim (ms) SPMV PERFORMANCE LANDSCAPE T ntir Flori Sprs Mtrix Colltion: 4.2K tsts (K40, p64 CsrMV) Hily orrlt wit prolm siz! Mtris y siz usparse Mtris y siz Mr-s 16
17 2D MERGE PATH DECOMPOSITION Nrsin Do, Amit Jin, n Murlir Mii An optiml prlll loritm or mrin usin multisltion. In. Pross. Ltt. 50, 2 (April 1994), O, S. t l Mr Pt - Prlll Mrin M Simpl. Proins o t 2012 IEEE 26t Intrntionl Prlll n Distriut Prossin Symposium Worksops & PD Forum (Wsinton, DC, USA, 2012),
18 T 2D mr-pt visuliztion T ision pt runs rom top-lt to ottom-rit: Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A strt i n 18
19 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A n 19
20 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A n 20
21 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A n 21
22 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A n 22
23 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A n 23
24 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A n 24
25 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A n 25
26 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A n 26
27 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A n 27
28 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A n 28
29 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A n 29
30 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A i n 30
31 Prtitionin t mr pt i 1. Prtition t ri into P qully-siz ionl rions (on tr pr rion) p 0 p 1 p 2 2. Trs sr lon ionls or 2D strtin oorints I.., Fin t irst (i,j) wr X i is rtr tn ll o t itms onsum or Y j 3. Trs run t sril mr loritm rom tir strtin points 31
32 Prtitionin t mr pt i 1. Prtition t ri into P qully-siz ionl rions (on tr pr rion) p 0 p 1 2. Trs sr lon ionls or 2D strtin oorints I.., Fin t irst (i,j) wr X i is rtr tn ll o t itms onsum or Y j 3. Trs run t sril mr loritm rom tir strtin points p 2 32
33 Prtitionin t mr pt p 0 i p 1 p 2 1. Prtition t ri into P qully-siz ionl rions (on tr pr rion) 2. Trs sr lon ionls or 2D strtin oorints I.., Fin t irst (i,j) wr X i is rtr tn ll o t itms onsum or Y j 3. Trs run t sril mr loritm rom tir strtin points 33
34 Prtitionin t mr pt p 0 i p 1 p 2 1. Prtition t ri into P qully-siz ionl rions (on tr pr rion) 2. Trs sr lon ionls or 2D strtin oorints I.., Fin t irst (i,j) wr X i is rtr tn ll o t itms onsum or Y j 3. Trs run t sril mr loritm rom tir strtin points 34
35 Prtitionin t mr pt i 1. Prtition t ri into P qully-siz ionl rions (on tr pr rion) p 0 p 1 2. Trs sr lon ionls or 2D strtin oorints I.., Fin t irst (i,j) wr X i is rtr tn ll o t itms onsum or Y j p 2 3. Trs run t sril mr loritm rom tir strtin points 35
36 Prtitionin t mr pt p 0 p 1 p 2 i 1. Prtition t ri into P qully-siz ionl rions (on tr pr rion) p 0 p 1 p 2 2. Trs sr lon ionls or 2D strtin oorints I.., Fin t irst (i,j) wr X i is rtr tn ll o t itms onsum or Y j 3. Trs run t sril mr loritm rom tir strtin points Work is prtly lo-ln! 36
37 Consumin t mr pt p 0 p 1 p 2 i 1. Prtition t ri into P qully-siz ionl rions (on tr pr rion) p 0 p 1 p 2 2. Trs sr lon ionls or 2D strtin oorints I.., Fin t irst (i,j) wr X i is rtr tn ll o t itms onsum or Y j 3. Trs run t sril mr loritm rom tir strtin points 37
38 Consumin t mr pt p 0 p 1 p 2 i 1. Prtition t ri into P qully-siz ionl rions (on tr pr rion) p 0 2. Trs sr lon ionls or 2D strtin oorints p 1 I.., Fin t irst (i,j) wr X i is rtr tn ll o t itms onsum or Y j 3. Trs run t sril mr loritm rom tir strtin points p 2 Work is prtly lo-ln! 38
39 Consumin t mr pt p 0 p 1 p 2 i 1. Prtition t ri into P qully-siz ionl rions (on tr pr rion) p 0 2. Trs sr lon ionls or 2D strtin oorints p 1 I.., Fin t irst (i,j) wr X i is rtr tn ll o t itms onsum or Y j 3. Trs run t sril mr loritm rom tir strtin points p 2 Work is prtly lo-ln! 39
40 Consumin t mr pt p 0 p 1 p 2 i 1. Prtition t ri into P qully-siz ionl rions (on tr pr rion) p 0 2. Trs sr lon ionls or 2D strtin oorints p 1 I.., Fin t irst (i,j) wr X i is rtr tn ll o t itms onsum or Y j 3. Trs run t sril mr loritm rom tir strtin points p 2 Work is prtly lo-ln! 40
41 Consumin t mr pt p 0 p 1 p 2 i 1. Prtition t ri into P qully-siz ionl rions (on tr pr rion) p 0 2. Trs sr lon ionls or 2D strtin oorints p 1 I.., Fin t irst (i,j) wr X i is rtr tn ll o t itms onsum or Y j 3. Trs run t sril mr loritm rom tir strtin points p 2 Work is prtly lo-ln! i 41
42 MERGE-BASED CsrMV 42
43 vlu inis (N) nonzro ot-prout omponnts MERGE-BASED CsrMV * 1. Loilly mr CSR row-osts vs. N (t nonzro inis) row osts Prtition t pt into P rions 0 3. Pt prossin: Aumult vlus wn movin own Flus n rst umultor wn rit 4 4. Fixup or prtil-sums rom rows tt ross prtitions
44 Vlu inis (N) Ax nonzro ot-prout omponnts MERGE-BASED CsrMV * 1. Loilly mr row-osts vs. N (t nonzro inis) Row osts Prtition t pt into P rions 0 3. Pt prossin: Aumult vlus wn movin own Flus n rst umultor wn rit p 0 p Fixup or prtil-sums rom rows tt ross prtitions p
45 Ax nonzro ot-prout omponnts MERGE-BASED CsrMV * p 0 p 1 p 2 1. Loilly mr row-osts vs. N (t nonzro inis) Prtition t pt into P rions p 0 p 0 3. Pt prossin: Aumult vlus wn movin own p p 1 Flus n rst umultor wn rit 4. Fixup or prtil-sums rom rows tt ross prtitions p p
46 Vlu inis (N) Ax nonzro ot-prout omponnts MERGE-BASED CsrMV * 1. Loilly mr row-osts vs. N (t nonzro inis) Row osts Prtition t pt into P rions Pt prossin: Aumult nonzro vlus wn movin own Flus n rst umultor wn rit Fixup or prtil-sums rom rows tt ross prtitions
47 Vlu inis (N) Ax nonzro ot-prout omponnts MERGE-BASED CsrMV * 1. Loilly mr row-osts vs. N (t nonzro inis) Row osts Prtition t pt into P rions Pt prossin: Aumult vlus wn movin own Flus n rst umultor wn rit Fixup or prtil-sums rom rows tt ross prtitions
48 Runnin tim (ms) SPMV PERFORMANCE LANDSCAPE T ntir Flori Sprs Mtrix Colltion: 4.2K tsts (K40, p64 CsrMV) usparse Mr-s Mu ir orrltion o runtim to prolm siz (0.79 vrsus 0.31) Mu lowr orrltion o FLOPS to row-lnt vrition (-0.02 vrsus -0.24) Mu lowr orrltion o FLOPS to row-lnt skw (-0.07 vrsus -0.23) Mtris y siz 48
49 QUESTIONS? Mrrill, D. n Grln, M Mr-s Prlll Sprs Mtrix-Vtor Multiplition usin t CSR Stor Formt. T. Rp. NVR , NVIDIA Corp. Furtr tnks n pprition or support rom DARPA PERFECT, Sn Bxtr, Mil Grln 49
50
Overview Linear Algebra Review Linear Algebra Review. What is a Matrix? Additional Resources. Basic Operations.
Oriw Ro Jnow Mon, Sptmr 2, 24 si mtri oprtions (, -, *) Cross n ot prouts Dtrminnts n inrss Homonous oorints Ortonorml sis itionl Rsours 8.6 Tt ook 6.837 Tt ook 6.837-stff@rpis.sil.mit.u Ck t ours wsit
More informationReachability. Directed DFS. Strong Connectivity Algorithm. Strong Connectivity. DFS tree rooted at v: vertices reachable from v via directed paths
irt Grphs OR SFO FW LX JFK MI OS irph is rph whos s r ll irt Short or irt rph pplitions on-wy strts lihts tsk shulin irphs ( 12.) irt Grphs 1 irt Grphs 2 irph Proprtis rph G=(V,) suh tht h os in on irtion:
More informationReview: Binary Trees. CSCI 262 Data Structures. Search Trees. In Order Traversal. Binary Search Trees 4/10/2018. Review: Binary Tree Implementation
Rviw: Binry Trs CSCI 262 Dt Struturs 21 Binry Srh Trs A inry tr is in rursivly: = or A inry tr is (mpty) root no with lt hil n riht hil, h o whih is inry tr. Rviw: Binry Tr Implmnttion Just ollow th rursiv
More informationReading. K-D Trees and Quad Trees. Geometric Data Structures. k-d Trees. Range Queries. Nearest Neighbor Search. Chapter 12.6
Rn Cptr 12.6 K-D Trs n Qu Trs CSE 326 Dt Struturs Ltur 9 2/2/05 K-D Trs n Qu Trs - Ltur 9 2 Gomtr Dt Struturs Ornzton o ponts, lns, plns, to support str prossn Appltons Astropsl smulton voluton o ls Grps
More informationWORKSHOP 2 Solid Shell Composites Modeling
WORKSHOP 2 Soli Shll Composits Moling WS2-1 WS2-2 Workshop Ojtivs Bom fmilir with stting up soli omposit shll mol Softwr Vrsion Ptrn 2011 MD Nstrn 2011.1 Fils Rquir soli_shll. WS2-3 Prolm Dsription Simult
More informationSoftware Pipelining Can we decrease the latency? Goal of SP Lecture. Seth Copen Goldstein Software Pipelining
5-4 Ltur 5-745 Sotwr Piplinin Copyrit St Copn Goltin -8 Sotwr Piplinin Sotwr piplinin i n IS tniqu tt rorr t intrution in loop. Poily movin intrution rom on itrtion to t prviou or t nxt itrtion. Vry lr
More informationMDP NO VALIDO PARA NAVEGACION REAL MAR DEL PLATA, ARGENTINA .RADAR.MINIMUM.ALTITUDES. GES NDB MDP VILLA GESSELL MAR DEL PLATA ASTOR PIAZZOLLA
SZM/MQ MR EL PLT pproach pt Elev 118.75 Spanish ly 73' lt Set: hpa MR EL PLT, RGENTIN.RR.MINIMUM.LTITUES. Trans level: y T Trans alt: 36-30 10 20 30 40 50 37-00 37-30 330^ 300^ 55 NM 230^ 2300 20 80 60
More informationIterative Sparse Triangular Solves for Preconditioning
Euro-Par 2015, Vienna Aug 24-28, 2015 Iterative Sparse Triangular Solves for Preconditioning Hartwig Anzt, Edmond Chow and Jack Dongarra Incomplete Factorization Preconditioning Incomplete LU factorizations
More informationFlexible Batched Sparse Matrix-Vector Product on GPUs
ScalA'17: 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems November 13, 217 Flexible Batched Sparse Matrix-Vector Product on GPUs Hartwig Anzt, Gary Collins, Jack Dongarra,
More informationPerformance, Scalability, and Numerical Stability of Manycore. Wen-mei Hwu University of Illinois at Urbana-Champaign
Prformn, Slility, nd Numril Stility of Mnyor Algorithms Wn-mi Hwu Univrsity of Illinois t Urn-Chmpign Cry XE6 Nods Blu Wtrs ontins,64 Cry XE6 omput nods. Dul-sokt Nod Two AMD Intrlgos hips 6 or moduls,
More informationNo valido para navegacion real
60 SZM/MQ STOR PIZZOLL MR EL PLT pproach pt Elev 118.75 Spanish ly 73' lt Set: hpa MR EL PLT, RGENTIN.RR.MINIMUM.LTITUES. Trans level: y T Trans alt: 36-30 80 78 NM 5 0 10 20 30 40 50 37-00 37-30 330^
More informationGPU ACCELERATION OF CHOLMOD: BATCHING, HYBRID AND MULTI-GPU
April 4-7, 2016 Silicon Valley GPU ACCELERATION OF CHOLMOD: BATCHING, HYBRID AND MULTI-GPU Steve Rennich, Darko Stosic, Tim Davis, April 6, 2016 OBJECTIVE Direct sparse methods are among the most widely
More information22nd STREET ELEVATION. 23rd STREET ELEVATION
SIN N OMPLX () Square : Washington 00 nd STRT LVTION Y PLN: T: MR, 0 PRINT ON 00% RYL ONTNT PPR 0 0 00 rd STRT LVTION ON-STG PU PPLITION LO LVTIONS nd and rd STRT NUMR: -0 L SIN N OMPLX () Square : Washington
More informationRATING: CT RING MOTOR CURRENT MEASURING CONNECT TO ANALOG INPUT #8 & +5VDC OL1 OL1 OL1 M1 STARTER PART #: M1 OVELOAD PART #: SETTING: OL2 OL2
8 7 5 4 00V L L L MIN ISNT PRT #: RTIN: RT SMPL RWIN LY: NS R RQUIR PR MIN L L L MIN US PRT #: RTIN: T RIN MOTOR URRNT MSURIN NT TO NLO INPUT #8 & +5V USS mp M M M MR MR MR OL OL OL M STRTR PRT #: P RTIN:
More informationAIR FORCE STANDARD DESIGN RPA SQUADRON OPERATIONS
RWIN IST NR NOTS 1. US TS OUMNTS IN ONUTION WIT T STNR SIN OUMNT N T INTRTIV PRORMMIN WORST. Sheet Number NM -1 OVR ST -1 STNR SIN PN -2 MOU PN -3 MOU XON -4 MOUS & PN & XON -5 MOUS & PN & XON -6 MOUS
More informationUniversal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation
ISO/IEC JTC1/SC2/WG2 N3308 L2/07-292 2007-09-01 Page 1 of 8 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Doc
More informationFinding a Funicular Curve Through Two Points
This is th glss pyrmi t th Louvr Musum in Pris, sign y rhitt I.M. Pi. It is support from nth y stl ls. In signing strutur suh s this, it is oftn most usful to slt l of rtin siz n tnsil strngth, n thn to
More informationDynamic Sparse Matrix Allocation on GPUs. James King
Dynamic Sparse Matrix Allocation on GPUs James King Graph Applications Dynamic updates to graphs Adding edges add entries to sparse matrix representation Motivation Graph operations (adding edges) (e.g.
More informationVishay Dale Plasma Panel Display Modules 128 x 64 Graphics Display with ASCII Input Controller, DC/DC Converter and Drive Circuitry FEATURES
P-G Plasma Panel isplay Modules x Graphics isplay with SII Input ontroller, / onverter and rive ircuitry TURS x pixel array for bright and vivid graphics. Parallel interface or RS- serial interface. Powerful
More informationGraph Theory & Applications. Boundaries Using Graphs. Graph Search. Find the route that minimizes. cost
Graph Thory & Appliations Bounaris Using Graphs 3 4 3 4 5 Fin th rout that minimizs osts Fin th ritial path in a projt Fin th optimal borr aroun a rgion Fin loop an no quations or analog iruit analysis
More informationSparse Matrix Formats
Christopher Bross Friedrich-Alexander-Universität Erlangen-Nürnberg Motivation Sparse Matrices are everywhere Sparse Matrix Formats C. Bross BGCE Research Day, Erlangen, 09.06.2016 2/16 Motivation Sparse
More informationINSTITUTE FOR PLASMA RESEARCH BHAT, GANDHINAGAR
R NOMNLTUR VLU TT TYP INVOLUT PRSSUR NL P.5 mm MOUL NUM *m NUM.5*m ILLT RIUS.*m WIT 5 mm +..5 etail R 5.5 ±. 5 etail MTRIL SS MR 5mmx5 Section view - QTY NOS SLOTS +. ll dimenssions are in mm RWN Y K Y
More informationPlasma Panel Display Modules 240 x 64 Graphics Display with ASCII Input Controller, DC/DC Converter and Drive Circuitry
Plasma Panel isplay Modules x Graphics isplay with SII Input ontroller, / onverter and rive ircuitry TURS x pixel array for bright and vivid graphics Parallel interface or RS- serial interface Powerful
More informationVAT GX - IP VIDEO FIELD ADD-ON/RETROFIT SINGLE CHANNEL ENCODER
L00 LL -800-999-600 VT GX - IP VIDO ILD DD-ON/RTROIT SINGL HNNL NODR NTWORK ONNTORS SING: MTRIL: P + BS X7240 OLOR: DRK BLU SUSTINBILITY: MMORY: PV R 256 MB RM, 256 MB LSH BTTRY BKD- RL TIM LOK POWR: POWR
More informationBatched Factorization and Inversion Routines for Block-Jacobi Preconditioning on GPUs
Workshop on Batched, Reproducible, and Reduced Precision BLAS Atlanta, GA 02/25/2017 Batched Factorization and Inversion Routines for Block-Jacobi Preconditioning on GPUs Hartwig Anzt Joint work with Goran
More informationPortability, Scalability, and Numerical Stability in Accelerated Kernels
Portility, Slility, nd Numril Stility in Alrtd Krnls John Strtton Dotorl Cndidt: Univrsity of Illinois t Urn-Chmpign Snior Arhitt: MultiorWr In Outlin Prformn Portility Wht CPU progrmmrs nd to lrn from
More informationLecture Outline. Memory Hierarchy Management. Register Allocation. Register Allocation. Lecture 38. Cache Management. Managing the Memory Hierarchy
Ltur Outlin Mmory Hirrhy Mngmnt Rgistr Allotion Ltu8 (rom nots y G. Nul n R. Boik) Rgistr Allotion Rgistr intrrn grph Grph oloring huristis Spilling Ch Mngmnt 4/27/08 Pro. Hilingr CS164 Ltu8 1 4/27/08
More informationHSHM-H110AX-5CPX HSHM-H105BX-5CPX TYPE B21, 105 SIGNAL CONTACTS HSHM-HXXXXXX-5CPX-XXXXX
M TM HSHM PRSS-FIT HR, -ROW, HSHM SRIS FOR HIGH SP HR MTRI PPLITIONS * UP TO Gb/s T RTS * LOW ROSSTLK T HIGH FRQUNIS * / (SINGL-N/IFFRNTIL) IMPN * MOULR/SLL FORMT I -- * MT LINS PR INH * SHIPS WITH PROTTIV
More informationTYPICAL RAISED POSITION
UPPR 1. TH LOTION OF RMP LOSUR GTS ND MOUNTING HIGHT OF PIVOT SHLL VRIFID Y TH NGINR.. HIGHT OF GUIDS MY VRID S RQUIRD FOR WRNING LIGHT LRN. 3. FIRGLSS/LUMINUM ND SHLL SUPPLID Y TH SM VNDOR. 4. TO MOUNTD
More informationVAT GX - IP VIDEO FIELD ADD-ON/RETROFIT SPECIFICATIONS PERSPECTIVE SIDE VIEW MOUNTING CONNECTIONS CALL SINGLE CHANNEL ENCODER
VT GX - IP VIDO ILD DD-ON/RTROIT L00 LL -800-999-600 SINGL HNNL NODR SPIITIONS NTWORK ONNTORS SING: MTRIL: P + BS X7240 OLOR: DRK BLU DIMNSIONS IN MILLIMTRS (DIMNSIONS IN INHS) SUSTINBILITY: MMORY: PV
More informationAdministrative Issues. L11: Sparse Linear Algebra on GPUs. Triangular Solve (STRSM) A Few Details 2/25/11. Next assignment, triangular solve
Administrative Issues L11: Sparse Linear Algebra on GPUs Next assignment, triangular solve Due 5PM, Tuesday, March 15 handin cs6963 lab 3 Project proposals Due 5PM, Wednesday, March 7 (hard
More informationSparse Matrix-Vector Multiplication for Circuit Simulation. Abstract
Sparse Matrix-Vector Multiplication for Circuit Simulation CSE260 (2012 Fall) Course Project Report Hao Zhuang *, and Jin Wang ** Computer Science and Engineering Department University of California, San
More informationFundamentals of Engineering Analysis ENGR Matrix Multiplication, Types
Fundmentls of Engineering Anlysis ENGR - Mtri Multiplition, Types Spring Slide Mtri Multiplition Define Conformle To multiply A * B, the mtries must e onformle. Given mtries: A m n nd B n p The numer of
More informationCCD Color Camera. KP-HD20A/KP-HD1001/5 Series. Remote control protocol Specifications. (preliminary version)
olor amera KP-H20/KP-H01/5 Series Remote control protocol Specifications (preliminary version) Hitachi Kokusai lectric Inc. MOL SIGN HK - Sep.11.9 (first edition) Hirayama Hirayama SYMOL T SRIPTION (RWN)
More informationL17: Introduction to Irregular Algorithms and MPI, cont.! November 8, 2011!
L17: Introduction to Irregular Algorithms and MPI, cont.! November 8, 2011! Administrative Class cancelled, Tuesday, November 15 Guest Lecture, Thursday, November 17, Ganesh Gopalakrishnan CUDA Project
More informationWAN Technology and Routing
PS 60 - Network Programming WN Technology and Routing Michele Weigle epartment of omputer Science lemson University mweigle@cs.clemson.edu March, 00 http://www.cs.clemson.edu/~mweigle/courses/cpsc60 WN
More informationDIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka
USE OF FOR Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka Faculty of Nuclear Sciences and Physical Engineering Czech Technical University in Prague Mini workshop on advanced numerical methods
More informationOn the Parallel Solution of Sparse Triangular Linear Systems. M. Naumov* San Jose, CA May 16, 2012 *NVIDIA
On the Parallel Solution of Sparse Triangular Linear Systems M. Naumov* San Jose, CA May 16, 2012 *NVIDIA Why Is This Interesting? There exist different classes of parallel problems Embarrassingly parallel
More informationCSE 599 I Accelerated Computing - Programming GPUS. Parallel Pattern: Sparse Matrices
CSE 599 I Accelerated Computing - Programming GPUS Parallel Pattern: Sparse Matrices Objective Learn about various sparse matrix representations Consider how input data affects run-time performance of
More informationASSIGNMENT 9: CACHE MEMORY NAME. Assume we are building a cache for a memory system that s just 16 bytes big 4 address bits.
. SSIGNMNT : H MMORY NM PROLM : -YT H OR -YT MMORY. ssume we are building a cache for a memory system that s just bytes big address bits. We will make a direct mapped cache that has four set, so there
More informationBlock Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations
Block Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations D. Zheltkov, N. Zamarashkin INM RAS September 24, 2018 Scalability of Lanczos method Notations Matrix order
More informationCharacteristics of Fault Simulation. Fault Simulation Techniques. Parallel Fault Simulation. Parallel Fault Simulation
Chrtristis o Fult Simultion Fult tivity with rspt to ult-r iruit is otn sprs oth in tim n sp. For mpl F is not tivt y th givn pttrn, whil F2 ts only th lowr prt o this iruit. Fult Simultion Thniqus Prlll
More informationProduct Update Harsh Environment RJ45 MRJ CONNECTOR SERIES
Product Update Harsh nvironment RJ45 MRJ ONNTOR SRIS bout mphenol ommercial Products mphenol s ommercial onnector Products are used in a variety of end user applications including Networking, Telecom,
More informationPorting the NAS-NPB Conjugate Gradient Benchmark to CUDA. NVIDIA Corporation
Porting the NAS-NPB Conjugate Gradient Benchmark to CUDA NVIDIA Corporation Outline! Overview of CG benchmark! Overview of CUDA Libraries! CUSPARSE! CUBLAS! Porting Sequence! Algorithm Analysis! Data/Code
More informationDistributed NVAMG. Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs
Distributed NVAMG Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs Istvan Reguly (istvan.reguly at oerc.ox.ac.uk) Oxford e-research Centre NVIDIA Summer Internship
More informationATA - AXIS TURBOCHARGER ACCESSORIES CATALOG REV MMK
3 aterial 12411 3.62" R TRBIN OSIN DISCR LN TO 4" X40 DOWNPIP 12412 3.62" R TRBIN OSIN DISCR LN TO 4.2" LL RON 12413 3.62" R TRBIN OSIN TO 351CW X DPTR 4.415" 1/2" RON 12414 3.62" R TRBIN OSIN DISCR LN
More informationPerformance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi
Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi Erik Saule 1, Kamer Kaya 1 and Ümit V. Çatalyürek 1,2 esaule@uncc.edu, {kamer,umit}@bmi.osu.edu 1 Department of Biomedical
More informationReport of Linear Solver Implementation on GPU
Report of Linear Solver Implementation on GPU XIANG LI Abstract As the development of technology and the linear equation solver is used in many aspects such as smart grid, aviation and chemical engineering,
More informationV = set of vertices (vertex / node) E = set of edges (v, w) (v, w in V)
Definitions G = (V, E) V = set of verties (vertex / noe) E = set of eges (v, w) (v, w in V) (v, w) orere => irete grph (igrph) (v, w) non-orere => unirete grph igrph: w is jent to v if there is n ege from
More informationBUILDING HIGH PERFORMANCE INPUT-ADAPTIVE GPU APPLICATIONS WITH NITRO
BUILDING HIGH PERFORMANCE INPUT-ADAPTIVE GPU APPLICATIONS WITH NITRO Saurav Muralidharan University of Utah nitro-tuner.github.io Disclaimers This research was funded in part by the U.S. Government. The
More informationA Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography
1 A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography He Huang, Liqiang Wang, Po Chen(University of Wyoming) John Dennis (NCAR) 2 LSQR in Seismic Tomography
More informationCompiling: Examples and Sample Problems
REs for Kywors Compiling: Exmpls n mpl Prolms IC312 Mchin-Lvl n ystms Progrmming Hnri Csnov (hnric@hwii.u) It is sy to fin RE tht scris ll kywors Ky = if ls for whil int.. Ths cn split in groups if n Kywor
More informationPrinter riendly View Page of 6 Service Manual: SYSTM WIRIN IRMS - 3500 POWR ISTRIUTION > 6.0 VIN U 2006 hevrolet utaway 6.0 ng 3500 ig : 6.0 VIN U, Power istribution ircuit ( of 6) K STRTR K R TTRY K USI
More informationBenchmarking SpMV on Many-Core Architecture
Bechmarkig SpMV o May-Core Architecture Biwei Xie ad Zhe Jia Istitute of Computig Techology Chiese Academy of Scieces Priceto Uiversity Why Bechmarkig? To Measure Is To Kow -- William Thomso (Lord Kelvi)
More informationEFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON GPUS USING THE CSR STORAGE FORMAT
EFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON GPUS USING THE CSR STORAGE FORMAT JOSEPH L. GREATHOUSE, MAYANK DAGA AMD RESEARCH 11/20/2014 THIS TALK IN ONE SLIDE Demonstrate how to save space and time
More informationGPU Implementation of Elliptic Solvers in NWP. Numerical Weather- and Climate- Prediction
1/8 GPU Implementation of Elliptic Solvers in Numerical Weather- and Climate- Prediction Eike Hermann Müller, Robert Scheichl Department of Mathematical Sciences EHM, Xu Guo, Sinan Shi and RS: http://arxiv.org/abs/1302.7193
More informationEFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI
EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,
More informationA Comparative Study on Exact Triangle Counting Algorithms on the GPU
A Comparative Study on Exact Triangle Counting Algorithms on the GPU Leyuan Wang, Yangzihao Wang, Carl Yang, John D. Owens University of California, Davis, CA, USA 31 st May 2016 L. Wang, Y. Wang, C. Yang,
More informationAutomatic Development of Linear Algebra Libraries for the Tesla Series
Automatic Development of Linear Algebra Libraries for the Tesla Series Enrique S. Quintana-Ortí quintana@icc.uji.es Universidad Jaime I de Castellón (Spain) Dense Linear Algebra Major problems: Source
More informationGlobal Register Allocation
Ltur Outlin Glol Rgistr Allotion Mmory Hirrhy Mngmnt Rgistr Allotion vi Grph Coloring Rgistr intrrn grph Grph oloring huristis Spilling Ch Mngmnt 2 Th Mmory Hirrhy Rgistrs 1 yl 256-8000 yts Ch 3 yls 256k-16M
More informationMatrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs
Iterative Solvers Numerical Results Conclusion and outlook 1/18 Matrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs Eike Hermann Müller, Robert Scheichl, Eero Vainikko
More informationGenerating Optimized Sparse Matrix Vector Product over Finite Fields
Generating Optimized Sparse Matrix Vector Product over Finite Fields Pascal Giorgi 1 and Bastien Vialla 1 LIRMM, CNRS, Université Montpellier 2, pascal.giorgi@lirmm.fr, bastien.vialla@lirmm.fr Abstract.
More informationNVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV Joe Eaton Ph.D.
NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Agenda Accelerated Computing nvgraph New Features Coming Soon Dynamic Graphs GraphBLAS 2 ACCELERATED COMPUTING 10x Performance
More informationSITE PLANNING. Satellite City Hall. ARTS ENSEMBLE Hula Halau. Drop-off. Box Office. Event/Meeting Rooms. City Offices EXHIBITION HALL.
ST PLNNN ula alau Satellite ity all Pa ula TRR R SPORT S PVL LON ON S KN ST OO OON UT R NS PRO RMN Lo`i ity Offices ox Office vent/meeting Rooms RN KPOLN LV vent Plaza ish Pond lvis Statue STR TSP War
More informationIntroduction to Data Mining
Introduction to Data Mining Lctur #15: Clustring-2 Soul National Univrsity 1 In Tis Lctur Larn t motivation and advantag of BFR, an xtnsion of K-mans to vry larg data Larn t motivation and advantag of
More informationPerformance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply
Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply University of California, Berkeley Berkeley Benchmarking and Optimization Group (BeBOP) http://bebop.cs.berkeley.edu
More informationFigure 6.1: Truss topology optimization diagram.
6 Implementation 6.1 Outline This chapter shows the implementation details to optimize the truss, obtained in the ground structure approach, according to the formulation presented in previous chapters.
More informationGenerating and Automatically Tuning OpenCL Code for Sparse Linear Algebra
Generating and Automatically Tuning OpenCL Code for Sparse Linear Algebra Dominik Grewe Anton Lokhmotov Media Processing Division ARM School of Informatics University of Edinburgh December 13, 2010 Introduction
More informationCarley Foundry Customer Specifications Index
01-SI-S-3-3224 - 0132-2030 10/02 02-5000 N 04-03-002 10 04-03-011 5 05-16 2-17-1989 0500-TS-001 06-03-004 16 06-03-015 3 074-6876-002 11-02 074-8432-119 6-28-02 2-O-01420 -- 3.O 4/1/86 4-74-4502 11 4-O-04188
More informationLeveraging Matrix Block Structure In Sparse Matrix-Vector Multiplication. Steve Rennich Nvidia Developer Technology - Compute
Leveraging Matrix Block Structure In Sparse Matrix-Vector Multiplication Steve Rennich Nvidia Developer Technology - Compute Block Sparse Matrix Vector Multiplication Sparse Matrix-Vector Multiplication
More informationHow to Write Fast Numerical Code
How to Write Fast Numerical Code Lecture: Memory bound computation, sparse linear algebra, OSKI Instructor: Markus Püschel TA: Alen Stojanov, Georg Ofenbeck, Gagandeep Singh ATLAS Mflop/s Compile Execute
More informationISO VIEW COVER, EXPRESS EXIT 4X4 FLIP COVER OPEN VIEW EXPRESS EXIT ON TROUGH VIEW
RV MO WN T 00899MO OVL 07-JN-5 0078MO HUH 7-SP-5 5.90 RF.87 RF.000 RF ISO VIW SL OVR, 0.50 OVR XTNSION X FLIP OVR FGS-MX-- (NOT INLU IN KIT).07 RF 7.7 RF FLIP OVR OPN VIW SL X STRIGHT STION RF RKT, XPRSS
More informationHow to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić
How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about
More informationExploiting Multiple GPUs in Sparse QR: Regular Numerics with Irregular Data Movement
Exploiting Multiple GPUs in Sparse QR: Regular Numerics with Irregular Data Movement Tim Davis (Texas A&M University) with Sanjay Ranka, Mohamed Gadou (University of Florida) Nuri Yeralan (Microsoft) NVIDIA
More informationHigh-Performance Graph Primitives on the GPU: Design and Implementation of Gunrock
High-Performance Graph Primitives on the GPU: Design and Implementation of Gunrock Yangzihao Wang University of California, Davis yzhwang@ucdavis.edu March 24, 2014 Yangzihao Wang (yzhwang@ucdavis.edu)
More informationQR Decomposition on GPUs
QR Decomposition QR Algorithms Block Householder QR Andrew Kerr* 1 Dan Campbell 1 Mark Richards 2 1 Georgia Tech Research Institute 2 School of Electrical and Computer Engineering Georgia Institute of
More informationWORKSHOP 17 BOX BEAM WITH TRANSIENT LOAD
WORKSHOP 17 BOX BEAM WITH TRANSIENT LOAD WS17-1 WS17-2 Workshop Ojtivs Prform mol nlysis for linr ynmi mol. Also, prform linr trnsint nlysis for th mol. Viw th shp of th mol ovr tim. Crt n X vs Y plot
More informationSummer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics
Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics Moysey Brio & Paul Dostert July 4, 2009 1 / 18 Sparse Matrices In many areas of applied mathematics and modeling, one
More informationDerivation of Mueller matrix from Zemax Tetsu Anan
Derivation of Mueller matrix from Zemax 2018.03.10 Tetsu Anan Definition of Stokes Q, U, & V Ichimoto et al. 2008 Stenflo 1994 View towards the sun, that is, from the direction of the observer Signs of
More informationMixed Precision Methods
Mixed Precision Methods Mixed precision, use the lowest precision required to achieve a given accuracy outcome " Improves runtime, reduce power consumption, lower data movement " Reformulate to find correction
More informationHistory Rgistr Allotion Exmpl As ol s intrmit o Consir this progrm with six vrils: := + := + := - 1 Us in th originl FORTRAN ompilr (1950 s) Vry ru lg
Th Mmory Hirrhy Avn Compilrs CMPSCI 710 Spring 2003 Highr = smllr, str, losr to CPU A rl sktop mhin (min) Rgistr Allotion Emry Brgr rgistrs 8 intgr, 8 loting-point; 1-yl ltny L1 h 8K t & instrutions; 2-yl
More informationA GPU Sparse Direct Solver for AX=B
1 / 25 A GPU Sparse Direct Solver for AX=B Jonathan Hogg, Evgueni Ovtchinnikov, Jennifer Scott* STFC Rutherford Appleton Laboratory 26 March 2014 GPU Technology Conference San Jose, California * Thanks
More informationMinimal Memory Abstractions
Miniml Memory Astrtions (As implemented for BioWre Corp ) Nthn Sturtevnt University of Alert GAMES Group Ferury, 7 Tlk Overview Prt I: Building Astrtions Minimizing memory requirements Performnes mesures
More informationRoofline Model (Will be using this in HW2)
Parallel Architecture Announcements HW0 is due Friday night, thank you for those who have already submitted HW1 is due Wednesday night Today Computing operational intensity Dwarves and Motifs Stencil computation
More informationBlocking Optimization Strategies for Sparse Tensor Computation
Blocking Optimization Strategies for Sparse Tensor Computation Jee Choi 1, Xing Liu 1, Shaden Smith 2, and Tyler Simon 3 1 IBM T. J. Watson Research, 2 University of Minnesota, 3 University of Maryland
More informationLocality-Aware Software Throttling for Sparse Matrix Operation on GPUs
Locality-Aware Software Throttling for Sparse Matrix Operation on GPUs Yanhao Chen 1, Ari B. Hayes 1, Chi Zhang 2, Timothy Salmon 1, Eddy Z. Zhang 1 1. Rutgers University 2. University of Pittsburgh Sparse
More informationME964 High Performance Computing for Engineering Applications
ME964 High Performance Computing for Engineering Applications Outlining Midterm Projects Topic 3: GPU-based FEA Topic 4: GPU Direct Solver for Sparse Linear Algebra March 01, 2011 Dan Negrut, 2011 ME964
More informationComputational Graphics: Lecture 15 SpMSpM and SpMV, or, who cares about complexity when we have a thousand processors?
Computational Graphics: Lecture 15 SpMSpM and SpMV, or, who cares about complexity when we have a thousand processors? The CVDLab Team Francesco Furiani Tue, April 3, 2014 ROMA TRE UNIVERSITÀ DEGLI STUDI
More informationPARDISO Version Reference Sheet Fortran
PARDISO Version 5.0.0 1 Reference Sheet Fortran CALL PARDISO(PT, MAXFCT, MNUM, MTYPE, PHASE, N, A, IA, JA, 1 PERM, NRHS, IPARM, MSGLVL, B, X, ERROR, DPARM) 1 Please note that this version differs significantly
More informationSTUDYING OPENMP WITH VAMPIR
STUDYING OPENMP WITH VAMPIR Case Studies Sparse Matrix Vector Multiplication Load Imbalances November 15, 2017 Studying OpenMP with Vampir 2 Sparse Matrix Vector Multiplication y 1 a 11 a n1 x 1 = y m
More informationHow to Write Fast Numerical Code Spring 2012 Lecture 13. Instructor: Markus Püschel TAs: Georg Ofenbeck & Daniele Spampinato
How to Write Fast Numerical Code Spring 2012 Lecture 13 Instructor: Markus Püschel TAs: Georg Ofenbeck & Daniele Spampinato ATLAS Mflop/s Compile Execute Measure Detect Hardware Parameters L1Size NR MulAdd
More informationLoGA: Low-Overhead GPU Accounting Using Events
10th ACM International Systems and Storage Conference, (KIT) KIT The Research University in the Helmholtz Association www.kit.edu GPU Sharing GPUs increasingly popular in computing Not every application
More informationALASKA RAILROAD CORPORATION
LSK RI LRO ORPORT ION LSK R I LRO ORPORT ION LSK RILRO ORPORTION NGINRING SRVIS P.O. OX 107500, NHORG, LSK 99510-7500 SPN RPLMNT PORTG, K PROJT : SPN RPLMNT TITL PG TL OF ONTNTS 1 14 U:\\eng-projects\ridges\R
More informationEfficient PageRank and SpMV Computation on AMD GPUs
2010 39th International Conference on Parallel Processing Efficient PageRank and SpMV Computation on AMD GPUs Tianji WU, Bo WANG, Yi SHAN, Feng YAN, Yu WANG and Ningyi XU Department of Electronic Engineering,
More informationMDP NO VALIDO PARA NAVEGACION REAL MAR DEL PLATA, ARGENTINA. .RADAR.MINIMUM.ALTITUDES. Trans level: By ATC GES NDB MDP VILLA GESSELL
SZM/MQ MR EL PLT pproach pt Elev 118.75 Spanish ly 73' lt Set: hpa MR EL PLT, RGENTIN.RR.MINIMUM.LTITUES. Trans level: y T Trans alt: ' 36-30 5 0 10 20 30 40 50 37-00 37-30 330^ 300^ 55 NM 230^ 2300 20
More informationCarley Foundry Customer Specifications Index
01-SI-S-3-3224 - 0132-2030 10/02 02-5000 N 04-03-002 10 04-03-011 5 05-16 2-17-1989 0500-TS-001 06-03-004 16 06-03-015 3 074-6876-002 11-02 074-8432-119 6-28-02 2-O-01420 -- 3.O 4/1/86 4-74-4502 11 4-O-04188
More informationCSE P 501 Compilers. Register Allocation Hal Perkins Spring UW CSE P 501 Spring 2018 P-1
CSE P 501 Compilrs Rgistr Allotion Hl Prkins Spring 2018 UW CSE P 501 Spring 2018 P-1 Agn Rgistr llotion onstrints Lol mthos Fstr ompil, slowr o, ut goo nough or lots o things (JITs, ) Glol llotion rgistr
More informationProduct Update Harsh Environment USB MUSB CONNECTOR SERIES
Product Update Harsh nvironment US MUS ONNTOR SRIS bout mphenol ommercial Products mphenol s ommercial onnector Products are used in a variety of end user applications including Networking, Telecom, Server
More informationParallelizing The Matrix Multiplication. 6/10/2013 LONI Parallel Programming Workshop
Parallelizing The Matrix Multiplication 6/10/2013 LONI Parallel Programming Workshop 2013 1 Serial version 6/10/2013 LONI Parallel Programming Workshop 2013 2 X = A md x B dn = C mn d c i,j = a i,k b k,j
More informationGeometric Partitioning for Tomography
Geometric Partitioning for Tomography Jan-Willem Buurlage, CWI Amsterdam Rob Bisseling, Utrecht University Joost Batenburg, CWI Amsterdam 2018-09-03, Tokyo, Japan SIAM Conference on Parallel Processing
More information