MERGE-BASED SpMV PERFECT WORKLOAD BALANCE. GUARANTEED. Duane Merrill, NVIDIA Research

Size: px
Start display at page:

Download "MERGE-BASED SpMV PERFECT WORKLOAD BALANCE. GUARANTEED. Duane Merrill, NVIDIA Research"

Transcription

1 MERGE-BASED SpMV PERFECT WORKLOAD BALANCE. GUARANTEED. Dun Mrrill, NVIDIA Rsr

2 SPARSE MATRIX-VECTOR MULTIPLICATION SpMV (Ax = y) * = sprs mtrix A ns vtor x ns vtor y 2

3 SPARSE MATRIX-VECTOR MULTIPLICATION Lots o vill prlllism * = ()() + ()() 0.0 ()() + ()() ()() + ()() + ()() +()() sprs mtrix A ns vtor x ns vtor y 3

4 PARALLEL DECOMPOSITION 4

5 COMPRESSED SPARSE ROW (CSR) FORMAT 3-rry rprsnttion vlus olumn inis A row osts

6 CSR PARALLEL DECOMPOSITION ) Row-s vlus olumn inis A row osts p 0 p 1 p 2 p

7 CSR PARALLEL DECOMPOSITION ) Row-s p 0 p 1 p 2 p 3 imln! vlus olumn inis A row osts p 0 p 1 p 2 p

8 CSR PARALLEL DECOMPOSITION ) Nonzro splittin p 0 p 1 p 2 p vlus olumn inis A row osts

9 CSR PARALLEL DECOMPOSITION ) Nonzro splittin p 0 p 1 p 2 p A vlus olumn inis row osts p 0 p 1 p 2 p imln! 9

10 CSR PARALLEL DECOMPOSITION ) Mr-s (loil) p 0 p 1 p vlus row osts olumn inis A 10

11 PERFORMANCE CONSISTENCY Consistny is r ttr tn rr momnts o rtnss -Sott Ginsr 11

12 Exmpls srmv() wit ~35M non-zros (K40, p64) trmom_k (tmprtur ormtion) nr-2000 (W onntivity) ASIC_320k (iruit simultion) usparse (row-s, vtoriz prlll omposition): 12.4 GFLOPS 5.9 GFLOPS 0.12 GFLOPS 12

13 Exmpls srmv() wit ~35M non-zros (K40, p64) trmom_k (tmprtur ormtion) nr-2000 (W onntivity) ASIC_320k (iruit simultion) usparse (row-s, vtoriz prlll omposition): Mr-s: 12.4 GFLOPS 5.9 GFLOPS 0.12 GFLOPS 15.5 GFLOPS 16.7 GFLOPS 14.1 GFLOPS 13

14 PERFORMANCE (IN)CONSISTENCY Prllliztion souln t introu rtits in t prormn lnsp Sours o t-pnnt prormn rtits: Contntion Worklo imln Exrt on mssivly prlll GPUs >60 spiliz GPU-spii SpMV loritms n sprs mtrix ormts! 14

15 Runtim (ms) Runtim (ms) SPMV PERFORMANCE LANDSCAPE T ntir Flori Sprs Mtrix Colltion: 4.2K tsts (K40, p64 CsrMV) Mtris y siz usparse Mtris y siz Mr-s 15

16 Runtim (ms) Runtim (ms) SPMV PERFORMANCE LANDSCAPE T ntir Flori Sprs Mtrix Colltion: 4.2K tsts (K40, p64 CsrMV) Hily orrlt wit prolm siz! Mtris y siz usparse Mtris y siz Mr-s 16

17 2D MERGE PATH DECOMPOSITION Nrsin Do, Amit Jin, n Murlir Mii An optiml prlll loritm or mrin usin multisltion. In. Pross. Ltt. 50, 2 (April 1994), O, S. t l Mr Pt - Prlll Mrin M Simpl. Proins o t 2012 IEEE 26t Intrntionl Prlll n Distriut Prossin Symposium Worksops & PD Forum (Wsinton, DC, USA, 2012),

18 T 2D mr-pt visuliztion T ision pt runs rom top-lt to ottom-rit: Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A strt i n 18

19 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A n 19

20 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A n 20

21 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A n 21

22 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A n 22

23 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A n 23

24 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A n 24

25 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A n 25

26 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A n 26

27 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A n 27

28 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A n 28

29 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A n 29

30 T 2D mr-pt visuliztion i T ision pt runs rom top-lt to ottom-rit: strt Movs rit wn onsumin rom list A Movs own wn onsumin rom list B E stp prous n output Brk tis y lwys prrrin t lmnt rom list A i n 30

31 Prtitionin t mr pt i 1. Prtition t ri into P qully-siz ionl rions (on tr pr rion) p 0 p 1 p 2 2. Trs sr lon ionls or 2D strtin oorints I.., Fin t irst (i,j) wr X i is rtr tn ll o t itms onsum or Y j 3. Trs run t sril mr loritm rom tir strtin points 31

32 Prtitionin t mr pt i 1. Prtition t ri into P qully-siz ionl rions (on tr pr rion) p 0 p 1 2. Trs sr lon ionls or 2D strtin oorints I.., Fin t irst (i,j) wr X i is rtr tn ll o t itms onsum or Y j 3. Trs run t sril mr loritm rom tir strtin points p 2 32

33 Prtitionin t mr pt p 0 i p 1 p 2 1. Prtition t ri into P qully-siz ionl rions (on tr pr rion) 2. Trs sr lon ionls or 2D strtin oorints I.., Fin t irst (i,j) wr X i is rtr tn ll o t itms onsum or Y j 3. Trs run t sril mr loritm rom tir strtin points 33

34 Prtitionin t mr pt p 0 i p 1 p 2 1. Prtition t ri into P qully-siz ionl rions (on tr pr rion) 2. Trs sr lon ionls or 2D strtin oorints I.., Fin t irst (i,j) wr X i is rtr tn ll o t itms onsum or Y j 3. Trs run t sril mr loritm rom tir strtin points 34

35 Prtitionin t mr pt i 1. Prtition t ri into P qully-siz ionl rions (on tr pr rion) p 0 p 1 2. Trs sr lon ionls or 2D strtin oorints I.., Fin t irst (i,j) wr X i is rtr tn ll o t itms onsum or Y j p 2 3. Trs run t sril mr loritm rom tir strtin points 35

36 Prtitionin t mr pt p 0 p 1 p 2 i 1. Prtition t ri into P qully-siz ionl rions (on tr pr rion) p 0 p 1 p 2 2. Trs sr lon ionls or 2D strtin oorints I.., Fin t irst (i,j) wr X i is rtr tn ll o t itms onsum or Y j 3. Trs run t sril mr loritm rom tir strtin points Work is prtly lo-ln! 36

37 Consumin t mr pt p 0 p 1 p 2 i 1. Prtition t ri into P qully-siz ionl rions (on tr pr rion) p 0 p 1 p 2 2. Trs sr lon ionls or 2D strtin oorints I.., Fin t irst (i,j) wr X i is rtr tn ll o t itms onsum or Y j 3. Trs run t sril mr loritm rom tir strtin points 37

38 Consumin t mr pt p 0 p 1 p 2 i 1. Prtition t ri into P qully-siz ionl rions (on tr pr rion) p 0 2. Trs sr lon ionls or 2D strtin oorints p 1 I.., Fin t irst (i,j) wr X i is rtr tn ll o t itms onsum or Y j 3. Trs run t sril mr loritm rom tir strtin points p 2 Work is prtly lo-ln! 38

39 Consumin t mr pt p 0 p 1 p 2 i 1. Prtition t ri into P qully-siz ionl rions (on tr pr rion) p 0 2. Trs sr lon ionls or 2D strtin oorints p 1 I.., Fin t irst (i,j) wr X i is rtr tn ll o t itms onsum or Y j 3. Trs run t sril mr loritm rom tir strtin points p 2 Work is prtly lo-ln! 39

40 Consumin t mr pt p 0 p 1 p 2 i 1. Prtition t ri into P qully-siz ionl rions (on tr pr rion) p 0 2. Trs sr lon ionls or 2D strtin oorints p 1 I.., Fin t irst (i,j) wr X i is rtr tn ll o t itms onsum or Y j 3. Trs run t sril mr loritm rom tir strtin points p 2 Work is prtly lo-ln! 40

41 Consumin t mr pt p 0 p 1 p 2 i 1. Prtition t ri into P qully-siz ionl rions (on tr pr rion) p 0 2. Trs sr lon ionls or 2D strtin oorints p 1 I.., Fin t irst (i,j) wr X i is rtr tn ll o t itms onsum or Y j 3. Trs run t sril mr loritm rom tir strtin points p 2 Work is prtly lo-ln! i 41

42 MERGE-BASED CsrMV 42

43 vlu inis (N) nonzro ot-prout omponnts MERGE-BASED CsrMV * 1. Loilly mr CSR row-osts vs. N (t nonzro inis) row osts Prtition t pt into P rions 0 3. Pt prossin: Aumult vlus wn movin own Flus n rst umultor wn rit 4 4. Fixup or prtil-sums rom rows tt ross prtitions

44 Vlu inis (N) Ax nonzro ot-prout omponnts MERGE-BASED CsrMV * 1. Loilly mr row-osts vs. N (t nonzro inis) Row osts Prtition t pt into P rions 0 3. Pt prossin: Aumult vlus wn movin own Flus n rst umultor wn rit p 0 p Fixup or prtil-sums rom rows tt ross prtitions p

45 Ax nonzro ot-prout omponnts MERGE-BASED CsrMV * p 0 p 1 p 2 1. Loilly mr row-osts vs. N (t nonzro inis) Prtition t pt into P rions p 0 p 0 3. Pt prossin: Aumult vlus wn movin own p p 1 Flus n rst umultor wn rit 4. Fixup or prtil-sums rom rows tt ross prtitions p p

46 Vlu inis (N) Ax nonzro ot-prout omponnts MERGE-BASED CsrMV * 1. Loilly mr row-osts vs. N (t nonzro inis) Row osts Prtition t pt into P rions Pt prossin: Aumult nonzro vlus wn movin own Flus n rst umultor wn rit Fixup or prtil-sums rom rows tt ross prtitions

47 Vlu inis (N) Ax nonzro ot-prout omponnts MERGE-BASED CsrMV * 1. Loilly mr row-osts vs. N (t nonzro inis) Row osts Prtition t pt into P rions Pt prossin: Aumult vlus wn movin own Flus n rst umultor wn rit Fixup or prtil-sums rom rows tt ross prtitions

48 Runnin tim (ms) SPMV PERFORMANCE LANDSCAPE T ntir Flori Sprs Mtrix Colltion: 4.2K tsts (K40, p64 CsrMV) usparse Mr-s Mu ir orrltion o runtim to prolm siz (0.79 vrsus 0.31) Mu lowr orrltion o FLOPS to row-lnt vrition (-0.02 vrsus -0.24) Mu lowr orrltion o FLOPS to row-lnt skw (-0.07 vrsus -0.23) Mtris y siz 48

49 QUESTIONS? Mrrill, D. n Grln, M Mr-s Prlll Sprs Mtrix-Vtor Multiplition usin t CSR Stor Formt. T. Rp. NVR , NVIDIA Corp. Furtr tnks n pprition or support rom DARPA PERFECT, Sn Bxtr, Mil Grln 49

50

Overview Linear Algebra Review Linear Algebra Review. What is a Matrix? Additional Resources. Basic Operations.

Overview Linear Algebra Review Linear Algebra Review. What is a Matrix? Additional Resources. Basic Operations. Oriw Ro Jnow Mon, Sptmr 2, 24 si mtri oprtions (, -, *) Cross n ot prouts Dtrminnts n inrss Homonous oorints Ortonorml sis itionl Rsours 8.6 Tt ook 6.837 Tt ook 6.837-stff@rpis.sil.mit.u Ck t ours wsit

More information

Reachability. Directed DFS. Strong Connectivity Algorithm. Strong Connectivity. DFS tree rooted at v: vertices reachable from v via directed paths

Reachability. Directed DFS. Strong Connectivity Algorithm. Strong Connectivity. DFS tree rooted at v: vertices reachable from v via directed paths irt Grphs OR SFO FW LX JFK MI OS irph is rph whos s r ll irt Short or irt rph pplitions on-wy strts lihts tsk shulin irphs ( 12.) irt Grphs 1 irt Grphs 2 irph Proprtis rph G=(V,) suh tht h os in on irtion:

More information

Review: Binary Trees. CSCI 262 Data Structures. Search Trees. In Order Traversal. Binary Search Trees 4/10/2018. Review: Binary Tree Implementation

Review: Binary Trees. CSCI 262 Data Structures. Search Trees. In Order Traversal. Binary Search Trees 4/10/2018. Review: Binary Tree Implementation Rviw: Binry Trs CSCI 262 Dt Struturs 21 Binry Srh Trs A inry tr is in rursivly: = or A inry tr is (mpty) root no with lt hil n riht hil, h o whih is inry tr. Rviw: Binry Tr Implmnttion Just ollow th rursiv

More information

Reading. K-D Trees and Quad Trees. Geometric Data Structures. k-d Trees. Range Queries. Nearest Neighbor Search. Chapter 12.6

Reading. K-D Trees and Quad Trees. Geometric Data Structures. k-d Trees. Range Queries. Nearest Neighbor Search. Chapter 12.6 Rn Cptr 12.6 K-D Trs n Qu Trs CSE 326 Dt Struturs Ltur 9 2/2/05 K-D Trs n Qu Trs - Ltur 9 2 Gomtr Dt Struturs Ornzton o ponts, lns, plns, to support str prossn Appltons Astropsl smulton voluton o ls Grps

More information

WORKSHOP 2 Solid Shell Composites Modeling

WORKSHOP 2 Solid Shell Composites Modeling WORKSHOP 2 Soli Shll Composits Moling WS2-1 WS2-2 Workshop Ojtivs Bom fmilir with stting up soli omposit shll mol Softwr Vrsion Ptrn 2011 MD Nstrn 2011.1 Fils Rquir soli_shll. WS2-3 Prolm Dsription Simult

More information

Software Pipelining Can we decrease the latency? Goal of SP Lecture. Seth Copen Goldstein Software Pipelining

Software Pipelining Can we decrease the latency? Goal of SP Lecture. Seth Copen Goldstein Software Pipelining 5-4 Ltur 5-745 Sotwr Piplinin Copyrit St Copn Goltin -8 Sotwr Piplinin Sotwr piplinin i n IS tniqu tt rorr t intrution in loop. Poily movin intrution rom on itrtion to t prviou or t nxt itrtion. Vry lr

More information

MDP NO VALIDO PARA NAVEGACION REAL MAR DEL PLATA, ARGENTINA .RADAR.MINIMUM.ALTITUDES. GES NDB MDP VILLA GESSELL MAR DEL PLATA ASTOR PIAZZOLLA

MDP NO VALIDO PARA NAVEGACION REAL MAR DEL PLATA, ARGENTINA .RADAR.MINIMUM.ALTITUDES. GES NDB MDP VILLA GESSELL MAR DEL PLATA ASTOR PIAZZOLLA SZM/MQ MR EL PLT pproach pt Elev 118.75 Spanish ly 73' lt Set: hpa MR EL PLT, RGENTIN.RR.MINIMUM.LTITUES. Trans level: y T Trans alt: 36-30 10 20 30 40 50 37-00 37-30 330^ 300^ 55 NM 230^ 2300 20 80 60

More information

Iterative Sparse Triangular Solves for Preconditioning

Iterative Sparse Triangular Solves for Preconditioning Euro-Par 2015, Vienna Aug 24-28, 2015 Iterative Sparse Triangular Solves for Preconditioning Hartwig Anzt, Edmond Chow and Jack Dongarra Incomplete Factorization Preconditioning Incomplete LU factorizations

More information

Flexible Batched Sparse Matrix-Vector Product on GPUs

Flexible Batched Sparse Matrix-Vector Product on GPUs ScalA'17: 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems November 13, 217 Flexible Batched Sparse Matrix-Vector Product on GPUs Hartwig Anzt, Gary Collins, Jack Dongarra,

More information

Performance, Scalability, and Numerical Stability of Manycore. Wen-mei Hwu University of Illinois at Urbana-Champaign

Performance, Scalability, and Numerical Stability of Manycore. Wen-mei Hwu University of Illinois at Urbana-Champaign Prformn, Slility, nd Numril Stility of Mnyor Algorithms Wn-mi Hwu Univrsity of Illinois t Urn-Chmpign Cry XE6 Nods Blu Wtrs ontins,64 Cry XE6 omput nods. Dul-sokt Nod Two AMD Intrlgos hips 6 or moduls,

More information

No valido para navegacion real

No valido para navegacion real 60 SZM/MQ STOR PIZZOLL MR EL PLT pproach pt Elev 118.75 Spanish ly 73' lt Set: hpa MR EL PLT, RGENTIN.RR.MINIMUM.LTITUES. Trans level: y T Trans alt: 36-30 80 78 NM 5 0 10 20 30 40 50 37-00 37-30 330^

More information

GPU ACCELERATION OF CHOLMOD: BATCHING, HYBRID AND MULTI-GPU

GPU ACCELERATION OF CHOLMOD: BATCHING, HYBRID AND MULTI-GPU April 4-7, 2016 Silicon Valley GPU ACCELERATION OF CHOLMOD: BATCHING, HYBRID AND MULTI-GPU Steve Rennich, Darko Stosic, Tim Davis, April 6, 2016 OBJECTIVE Direct sparse methods are among the most widely

More information

22nd STREET ELEVATION. 23rd STREET ELEVATION

22nd STREET ELEVATION. 23rd STREET ELEVATION SIN N OMPLX () Square : Washington 00 nd STRT LVTION Y PLN: T: MR, 0 PRINT ON 00% RYL ONTNT PPR 0 0 00 rd STRT LVTION ON-STG PU PPLITION LO LVTIONS nd and rd STRT NUMR: -0 L SIN N OMPLX () Square : Washington

More information

RATING: CT RING MOTOR CURRENT MEASURING CONNECT TO ANALOG INPUT #8 & +5VDC OL1 OL1 OL1 M1 STARTER PART #: M1 OVELOAD PART #: SETTING: OL2 OL2

RATING: CT RING MOTOR CURRENT MEASURING CONNECT TO ANALOG INPUT #8 & +5VDC OL1 OL1 OL1 M1 STARTER PART #: M1 OVELOAD PART #: SETTING: OL2 OL2 8 7 5 4 00V L L L MIN ISNT PRT #: RTIN: RT SMPL RWIN LY: NS R RQUIR PR MIN L L L MIN US PRT #: RTIN: T RIN MOTOR URRNT MSURIN NT TO NLO INPUT #8 & +5V USS mp M M M MR MR MR OL OL OL M STRTR PRT #: P RTIN:

More information

AIR FORCE STANDARD DESIGN RPA SQUADRON OPERATIONS

AIR FORCE STANDARD DESIGN RPA SQUADRON OPERATIONS RWIN IST NR NOTS 1. US TS OUMNTS IN ONUTION WIT T STNR SIN OUMNT N T INTRTIV PRORMMIN WORST. Sheet Number NM -1 OVR ST -1 STNR SIN PN -2 MOU PN -3 MOU XON -4 MOUS & PN & XON -5 MOUS & PN & XON -6 MOUS

More information

Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation

Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation ISO/IEC JTC1/SC2/WG2 N3308 L2/07-292 2007-09-01 Page 1 of 8 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Doc

More information

Finding a Funicular Curve Through Two Points

Finding a Funicular Curve Through Two Points This is th glss pyrmi t th Louvr Musum in Pris, sign y rhitt I.M. Pi. It is support from nth y stl ls. In signing strutur suh s this, it is oftn most usful to slt l of rtin siz n tnsil strngth, n thn to

More information

Dynamic Sparse Matrix Allocation on GPUs. James King

Dynamic Sparse Matrix Allocation on GPUs. James King Dynamic Sparse Matrix Allocation on GPUs James King Graph Applications Dynamic updates to graphs Adding edges add entries to sparse matrix representation Motivation Graph operations (adding edges) (e.g.

More information

Vishay Dale Plasma Panel Display Modules 128 x 64 Graphics Display with ASCII Input Controller, DC/DC Converter and Drive Circuitry FEATURES

Vishay Dale Plasma Panel Display Modules 128 x 64 Graphics Display with ASCII Input Controller, DC/DC Converter and Drive Circuitry FEATURES P-G Plasma Panel isplay Modules x Graphics isplay with SII Input ontroller, / onverter and rive ircuitry TURS x pixel array for bright and vivid graphics. Parallel interface or RS- serial interface. Powerful

More information

Graph Theory & Applications. Boundaries Using Graphs. Graph Search. Find the route that minimizes. cost

Graph Theory & Applications. Boundaries Using Graphs. Graph Search. Find the route that minimizes. cost Graph Thory & Appliations Bounaris Using Graphs 3 4 3 4 5 Fin th rout that minimizs osts Fin th ritial path in a projt Fin th optimal borr aroun a rgion Fin loop an no quations or analog iruit analysis

More information

Sparse Matrix Formats

Sparse Matrix Formats Christopher Bross Friedrich-Alexander-Universität Erlangen-Nürnberg Motivation Sparse Matrices are everywhere Sparse Matrix Formats C. Bross BGCE Research Day, Erlangen, 09.06.2016 2/16 Motivation Sparse

More information

INSTITUTE FOR PLASMA RESEARCH BHAT, GANDHINAGAR

INSTITUTE FOR PLASMA RESEARCH BHAT, GANDHINAGAR R NOMNLTUR VLU TT TYP INVOLUT PRSSUR NL P.5 mm MOUL NUM *m NUM.5*m ILLT RIUS.*m WIT 5 mm +..5 etail R 5.5 ±. 5 etail MTRIL SS MR 5mmx5 Section view - QTY NOS SLOTS +. ll dimenssions are in mm RWN Y K Y

More information

Plasma Panel Display Modules 240 x 64 Graphics Display with ASCII Input Controller, DC/DC Converter and Drive Circuitry

Plasma Panel Display Modules 240 x 64 Graphics Display with ASCII Input Controller, DC/DC Converter and Drive Circuitry Plasma Panel isplay Modules x Graphics isplay with SII Input ontroller, / onverter and rive ircuitry TURS x pixel array for bright and vivid graphics Parallel interface or RS- serial interface Powerful

More information

VAT GX - IP VIDEO FIELD ADD-ON/RETROFIT SINGLE CHANNEL ENCODER

VAT GX - IP VIDEO FIELD ADD-ON/RETROFIT SINGLE CHANNEL ENCODER L00 LL -800-999-600 VT GX - IP VIDO ILD DD-ON/RTROIT SINGL HNNL NODR NTWORK ONNTORS SING: MTRIL: P + BS X7240 OLOR: DRK BLU SUSTINBILITY: MMORY: PV R 256 MB RM, 256 MB LSH BTTRY BKD- RL TIM LOK POWR: POWR

More information

Batched Factorization and Inversion Routines for Block-Jacobi Preconditioning on GPUs

Batched Factorization and Inversion Routines for Block-Jacobi Preconditioning on GPUs Workshop on Batched, Reproducible, and Reduced Precision BLAS Atlanta, GA 02/25/2017 Batched Factorization and Inversion Routines for Block-Jacobi Preconditioning on GPUs Hartwig Anzt Joint work with Goran

More information

Portability, Scalability, and Numerical Stability in Accelerated Kernels

Portability, Scalability, and Numerical Stability in Accelerated Kernels Portility, Slility, nd Numril Stility in Alrtd Krnls John Strtton Dotorl Cndidt: Univrsity of Illinois t Urn-Chmpign Snior Arhitt: MultiorWr In Outlin Prformn Portility Wht CPU progrmmrs nd to lrn from

More information

Lecture Outline. Memory Hierarchy Management. Register Allocation. Register Allocation. Lecture 38. Cache Management. Managing the Memory Hierarchy

Lecture Outline. Memory Hierarchy Management. Register Allocation. Register Allocation. Lecture 38. Cache Management. Managing the Memory Hierarchy Ltur Outlin Mmory Hirrhy Mngmnt Rgistr Allotion Ltu8 (rom nots y G. Nul n R. Boik) Rgistr Allotion Rgistr intrrn grph Grph oloring huristis Spilling Ch Mngmnt 4/27/08 Pro. Hilingr CS164 Ltu8 1 4/27/08

More information

HSHM-H110AX-5CPX HSHM-H105BX-5CPX TYPE B21, 105 SIGNAL CONTACTS HSHM-HXXXXXX-5CPX-XXXXX

HSHM-H110AX-5CPX HSHM-H105BX-5CPX TYPE B21, 105 SIGNAL CONTACTS HSHM-HXXXXXX-5CPX-XXXXX M TM HSHM PRSS-FIT HR, -ROW, HSHM SRIS FOR HIGH SP HR MTRI PPLITIONS * UP TO Gb/s T RTS * LOW ROSSTLK T HIGH FRQUNIS * / (SINGL-N/IFFRNTIL) IMPN * MOULR/SLL FORMT I -- * MT LINS PR INH * SHIPS WITH PROTTIV

More information

TYPICAL RAISED POSITION

TYPICAL RAISED POSITION UPPR 1. TH LOTION OF RMP LOSUR GTS ND MOUNTING HIGHT OF PIVOT SHLL VRIFID Y TH NGINR.. HIGHT OF GUIDS MY VRID S RQUIRD FOR WRNING LIGHT LRN. 3. FIRGLSS/LUMINUM ND SHLL SUPPLID Y TH SM VNDOR. 4. TO MOUNTD

More information

VAT GX - IP VIDEO FIELD ADD-ON/RETROFIT SPECIFICATIONS PERSPECTIVE SIDE VIEW MOUNTING CONNECTIONS CALL SINGLE CHANNEL ENCODER

VAT GX - IP VIDEO FIELD ADD-ON/RETROFIT SPECIFICATIONS PERSPECTIVE SIDE VIEW MOUNTING CONNECTIONS CALL SINGLE CHANNEL ENCODER VT GX - IP VIDO ILD DD-ON/RTROIT L00 LL -800-999-600 SINGL HNNL NODR SPIITIONS NTWORK ONNTORS SING: MTRIL: P + BS X7240 OLOR: DRK BLU DIMNSIONS IN MILLIMTRS (DIMNSIONS IN INHS) SUSTINBILITY: MMORY: PV

More information

Administrative Issues. L11: Sparse Linear Algebra on GPUs. Triangular Solve (STRSM) A Few Details 2/25/11. Next assignment, triangular solve

Administrative Issues. L11: Sparse Linear Algebra on GPUs. Triangular Solve (STRSM) A Few Details 2/25/11. Next assignment, triangular solve Administrative Issues L11: Sparse Linear Algebra on GPUs Next assignment, triangular solve Due 5PM, Tuesday, March 15 handin cs6963 lab 3 Project proposals Due 5PM, Wednesday, March 7 (hard

More information

Sparse Matrix-Vector Multiplication for Circuit Simulation. Abstract

Sparse Matrix-Vector Multiplication for Circuit Simulation. Abstract Sparse Matrix-Vector Multiplication for Circuit Simulation CSE260 (2012 Fall) Course Project Report Hao Zhuang *, and Jin Wang ** Computer Science and Engineering Department University of California, San

More information

Fundamentals of Engineering Analysis ENGR Matrix Multiplication, Types

Fundamentals of Engineering Analysis ENGR Matrix Multiplication, Types Fundmentls of Engineering Anlysis ENGR - Mtri Multiplition, Types Spring Slide Mtri Multiplition Define Conformle To multiply A * B, the mtries must e onformle. Given mtries: A m n nd B n p The numer of

More information

CCD Color Camera. KP-HD20A/KP-HD1001/5 Series. Remote control protocol Specifications. (preliminary version)

CCD Color Camera. KP-HD20A/KP-HD1001/5 Series. Remote control protocol Specifications. (preliminary version) olor amera KP-H20/KP-H01/5 Series Remote control protocol Specifications (preliminary version) Hitachi Kokusai lectric Inc. MOL SIGN HK - Sep.11.9 (first edition) Hirayama Hirayama SYMOL T SRIPTION (RWN)

More information

L17: Introduction to Irregular Algorithms and MPI, cont.! November 8, 2011!

L17: Introduction to Irregular Algorithms and MPI, cont.! November 8, 2011! L17: Introduction to Irregular Algorithms and MPI, cont.! November 8, 2011! Administrative Class cancelled, Tuesday, November 15 Guest Lecture, Thursday, November 17, Ganesh Gopalakrishnan CUDA Project

More information

WAN Technology and Routing

WAN Technology and Routing PS 60 - Network Programming WN Technology and Routing Michele Weigle epartment of omputer Science lemson University mweigle@cs.clemson.edu March, 00 http://www.cs.clemson.edu/~mweigle/courses/cpsc60 WN

More information

DIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka

DIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka USE OF FOR Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka Faculty of Nuclear Sciences and Physical Engineering Czech Technical University in Prague Mini workshop on advanced numerical methods

More information

On the Parallel Solution of Sparse Triangular Linear Systems. M. Naumov* San Jose, CA May 16, 2012 *NVIDIA

On the Parallel Solution of Sparse Triangular Linear Systems. M. Naumov* San Jose, CA May 16, 2012 *NVIDIA On the Parallel Solution of Sparse Triangular Linear Systems M. Naumov* San Jose, CA May 16, 2012 *NVIDIA Why Is This Interesting? There exist different classes of parallel problems Embarrassingly parallel

More information

CSE 599 I Accelerated Computing - Programming GPUS. Parallel Pattern: Sparse Matrices

CSE 599 I Accelerated Computing - Programming GPUS. Parallel Pattern: Sparse Matrices CSE 599 I Accelerated Computing - Programming GPUS Parallel Pattern: Sparse Matrices Objective Learn about various sparse matrix representations Consider how input data affects run-time performance of

More information

ASSIGNMENT 9: CACHE MEMORY NAME. Assume we are building a cache for a memory system that s just 16 bytes big 4 address bits.

ASSIGNMENT 9: CACHE MEMORY NAME. Assume we are building a cache for a memory system that s just 16 bytes big 4 address bits. . SSIGNMNT : H MMORY NM PROLM : -YT H OR -YT MMORY. ssume we are building a cache for a memory system that s just bytes big address bits. We will make a direct mapped cache that has four set, so there

More information

Block Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations

Block Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations Block Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations D. Zheltkov, N. Zamarashkin INM RAS September 24, 2018 Scalability of Lanczos method Notations Matrix order

More information

Characteristics of Fault Simulation. Fault Simulation Techniques. Parallel Fault Simulation. Parallel Fault Simulation

Characteristics of Fault Simulation. Fault Simulation Techniques. Parallel Fault Simulation. Parallel Fault Simulation Chrtristis o Fult Simultion Fult tivity with rspt to ult-r iruit is otn sprs oth in tim n sp. For mpl F is not tivt y th givn pttrn, whil F2 ts only th lowr prt o this iruit. Fult Simultion Thniqus Prlll

More information

Product Update Harsh Environment RJ45 MRJ CONNECTOR SERIES

Product Update Harsh Environment RJ45 MRJ CONNECTOR SERIES Product Update Harsh nvironment RJ45 MRJ ONNTOR SRIS bout mphenol ommercial Products mphenol s ommercial onnector Products are used in a variety of end user applications including Networking, Telecom,

More information

Porting the NAS-NPB Conjugate Gradient Benchmark to CUDA. NVIDIA Corporation

Porting the NAS-NPB Conjugate Gradient Benchmark to CUDA. NVIDIA Corporation Porting the NAS-NPB Conjugate Gradient Benchmark to CUDA NVIDIA Corporation Outline! Overview of CG benchmark! Overview of CUDA Libraries! CUSPARSE! CUBLAS! Porting Sequence! Algorithm Analysis! Data/Code

More information

Distributed NVAMG. Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs

Distributed NVAMG. Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs Distributed NVAMG Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs Istvan Reguly (istvan.reguly at oerc.ox.ac.uk) Oxford e-research Centre NVIDIA Summer Internship

More information

ATA - AXIS TURBOCHARGER ACCESSORIES CATALOG REV MMK

ATA - AXIS TURBOCHARGER ACCESSORIES CATALOG REV MMK 3 aterial 12411 3.62" R TRBIN OSIN DISCR LN TO 4" X40 DOWNPIP 12412 3.62" R TRBIN OSIN DISCR LN TO 4.2" LL RON 12413 3.62" R TRBIN OSIN TO 351CW X DPTR 4.415" 1/2" RON 12414 3.62" R TRBIN OSIN DISCR LN

More information

Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi

Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi Erik Saule 1, Kamer Kaya 1 and Ümit V. Çatalyürek 1,2 esaule@uncc.edu, {kamer,umit}@bmi.osu.edu 1 Department of Biomedical

More information

Report of Linear Solver Implementation on GPU

Report of Linear Solver Implementation on GPU Report of Linear Solver Implementation on GPU XIANG LI Abstract As the development of technology and the linear equation solver is used in many aspects such as smart grid, aviation and chemical engineering,

More information

V = set of vertices (vertex / node) E = set of edges (v, w) (v, w in V)

V = set of vertices (vertex / node) E = set of edges (v, w) (v, w in V) Definitions G = (V, E) V = set of verties (vertex / noe) E = set of eges (v, w) (v, w in V) (v, w) orere => irete grph (igrph) (v, w) non-orere => unirete grph igrph: w is jent to v if there is n ege from

More information

BUILDING HIGH PERFORMANCE INPUT-ADAPTIVE GPU APPLICATIONS WITH NITRO

BUILDING HIGH PERFORMANCE INPUT-ADAPTIVE GPU APPLICATIONS WITH NITRO BUILDING HIGH PERFORMANCE INPUT-ADAPTIVE GPU APPLICATIONS WITH NITRO Saurav Muralidharan University of Utah nitro-tuner.github.io Disclaimers This research was funded in part by the U.S. Government. The

More information

A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography

A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography 1 A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography He Huang, Liqiang Wang, Po Chen(University of Wyoming) John Dennis (NCAR) 2 LSQR in Seismic Tomography

More information

Compiling: Examples and Sample Problems

Compiling: Examples and Sample Problems REs for Kywors Compiling: Exmpls n mpl Prolms IC312 Mchin-Lvl n ystms Progrmming Hnri Csnov (hnric@hwii.u) It is sy to fin RE tht scris ll kywors Ky = if ls for whil int.. Ths cn split in groups if n Kywor

More information

Printer riendly View Page of 6 Service Manual: SYSTM WIRIN IRMS - 3500 POWR ISTRIUTION > 6.0 VIN U 2006 hevrolet utaway 6.0 ng 3500 ig : 6.0 VIN U, Power istribution ircuit ( of 6) K STRTR K R TTRY K USI

More information

Benchmarking SpMV on Many-Core Architecture

Benchmarking SpMV on Many-Core Architecture Bechmarkig SpMV o May-Core Architecture Biwei Xie ad Zhe Jia Istitute of Computig Techology Chiese Academy of Scieces Priceto Uiversity Why Bechmarkig? To Measure Is To Kow -- William Thomso (Lord Kelvi)

More information

EFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON GPUS USING THE CSR STORAGE FORMAT

EFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON GPUS USING THE CSR STORAGE FORMAT EFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON GPUS USING THE CSR STORAGE FORMAT JOSEPH L. GREATHOUSE, MAYANK DAGA AMD RESEARCH 11/20/2014 THIS TALK IN ONE SLIDE Demonstrate how to save space and time

More information

GPU Implementation of Elliptic Solvers in NWP. Numerical Weather- and Climate- Prediction

GPU Implementation of Elliptic Solvers in NWP. Numerical Weather- and Climate- Prediction 1/8 GPU Implementation of Elliptic Solvers in Numerical Weather- and Climate- Prediction Eike Hermann Müller, Robert Scheichl Department of Mathematical Sciences EHM, Xu Guo, Sinan Shi and RS: http://arxiv.org/abs/1302.7193

More information

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,

More information

A Comparative Study on Exact Triangle Counting Algorithms on the GPU

A Comparative Study on Exact Triangle Counting Algorithms on the GPU A Comparative Study on Exact Triangle Counting Algorithms on the GPU Leyuan Wang, Yangzihao Wang, Carl Yang, John D. Owens University of California, Davis, CA, USA 31 st May 2016 L. Wang, Y. Wang, C. Yang,

More information

Automatic Development of Linear Algebra Libraries for the Tesla Series

Automatic Development of Linear Algebra Libraries for the Tesla Series Automatic Development of Linear Algebra Libraries for the Tesla Series Enrique S. Quintana-Ortí quintana@icc.uji.es Universidad Jaime I de Castellón (Spain) Dense Linear Algebra Major problems: Source

More information

Global Register Allocation

Global Register Allocation Ltur Outlin Glol Rgistr Allotion Mmory Hirrhy Mngmnt Rgistr Allotion vi Grph Coloring Rgistr intrrn grph Grph oloring huristis Spilling Ch Mngmnt 2 Th Mmory Hirrhy Rgistrs 1 yl 256-8000 yts Ch 3 yls 256k-16M

More information

Matrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs

Matrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs Iterative Solvers Numerical Results Conclusion and outlook 1/18 Matrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs Eike Hermann Müller, Robert Scheichl, Eero Vainikko

More information

Generating Optimized Sparse Matrix Vector Product over Finite Fields

Generating Optimized Sparse Matrix Vector Product over Finite Fields Generating Optimized Sparse Matrix Vector Product over Finite Fields Pascal Giorgi 1 and Bastien Vialla 1 LIRMM, CNRS, Université Montpellier 2, pascal.giorgi@lirmm.fr, bastien.vialla@lirmm.fr Abstract.

More information

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV Joe Eaton Ph.D.

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV Joe Eaton Ph.D. NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Agenda Accelerated Computing nvgraph New Features Coming Soon Dynamic Graphs GraphBLAS 2 ACCELERATED COMPUTING 10x Performance

More information

SITE PLANNING. Satellite City Hall. ARTS ENSEMBLE Hula Halau. Drop-off. Box Office. Event/Meeting Rooms. City Offices EXHIBITION HALL.

SITE PLANNING. Satellite City Hall. ARTS ENSEMBLE Hula Halau. Drop-off. Box Office. Event/Meeting Rooms. City Offices EXHIBITION HALL. ST PLNNN ula alau Satellite ity all Pa ula TRR R SPORT S PVL LON ON S KN ST OO OON UT R NS PRO RMN Lo`i ity Offices ox Office vent/meeting Rooms RN KPOLN LV vent Plaza ish Pond lvis Statue STR TSP War

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lctur #15: Clustring-2 Soul National Univrsity 1 In Tis Lctur Larn t motivation and advantag of BFR, an xtnsion of K-mans to vry larg data Larn t motivation and advantag of

More information

Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply

Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply University of California, Berkeley Berkeley Benchmarking and Optimization Group (BeBOP) http://bebop.cs.berkeley.edu

More information

Figure 6.1: Truss topology optimization diagram.

Figure 6.1: Truss topology optimization diagram. 6 Implementation 6.1 Outline This chapter shows the implementation details to optimize the truss, obtained in the ground structure approach, according to the formulation presented in previous chapters.

More information

Generating and Automatically Tuning OpenCL Code for Sparse Linear Algebra

Generating and Automatically Tuning OpenCL Code for Sparse Linear Algebra Generating and Automatically Tuning OpenCL Code for Sparse Linear Algebra Dominik Grewe Anton Lokhmotov Media Processing Division ARM School of Informatics University of Edinburgh December 13, 2010 Introduction

More information

Carley Foundry Customer Specifications Index

Carley Foundry Customer Specifications Index 01-SI-S-3-3224 - 0132-2030 10/02 02-5000 N 04-03-002 10 04-03-011 5 05-16 2-17-1989 0500-TS-001 06-03-004 16 06-03-015 3 074-6876-002 11-02 074-8432-119 6-28-02 2-O-01420 -- 3.O 4/1/86 4-74-4502 11 4-O-04188

More information

Leveraging Matrix Block Structure In Sparse Matrix-Vector Multiplication. Steve Rennich Nvidia Developer Technology - Compute

Leveraging Matrix Block Structure In Sparse Matrix-Vector Multiplication. Steve Rennich Nvidia Developer Technology - Compute Leveraging Matrix Block Structure In Sparse Matrix-Vector Multiplication Steve Rennich Nvidia Developer Technology - Compute Block Sparse Matrix Vector Multiplication Sparse Matrix-Vector Multiplication

More information

How to Write Fast Numerical Code

How to Write Fast Numerical Code How to Write Fast Numerical Code Lecture: Memory bound computation, sparse linear algebra, OSKI Instructor: Markus Püschel TA: Alen Stojanov, Georg Ofenbeck, Gagandeep Singh ATLAS Mflop/s Compile Execute

More information

ISO VIEW COVER, EXPRESS EXIT 4X4 FLIP COVER OPEN VIEW EXPRESS EXIT ON TROUGH VIEW

ISO VIEW COVER, EXPRESS EXIT 4X4 FLIP COVER OPEN VIEW EXPRESS EXIT ON TROUGH VIEW RV MO WN T 00899MO OVL 07-JN-5 0078MO HUH 7-SP-5 5.90 RF.87 RF.000 RF ISO VIW SL OVR, 0.50 OVR XTNSION X FLIP OVR FGS-MX-- (NOT INLU IN KIT).07 RF 7.7 RF FLIP OVR OPN VIW SL X STRIGHT STION RF RKT, XPRSS

More information

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about

More information

Exploiting Multiple GPUs in Sparse QR: Regular Numerics with Irregular Data Movement

Exploiting Multiple GPUs in Sparse QR: Regular Numerics with Irregular Data Movement Exploiting Multiple GPUs in Sparse QR: Regular Numerics with Irregular Data Movement Tim Davis (Texas A&M University) with Sanjay Ranka, Mohamed Gadou (University of Florida) Nuri Yeralan (Microsoft) NVIDIA

More information

High-Performance Graph Primitives on the GPU: Design and Implementation of Gunrock

High-Performance Graph Primitives on the GPU: Design and Implementation of Gunrock High-Performance Graph Primitives on the GPU: Design and Implementation of Gunrock Yangzihao Wang University of California, Davis yzhwang@ucdavis.edu March 24, 2014 Yangzihao Wang (yzhwang@ucdavis.edu)

More information

QR Decomposition on GPUs

QR Decomposition on GPUs QR Decomposition QR Algorithms Block Householder QR Andrew Kerr* 1 Dan Campbell 1 Mark Richards 2 1 Georgia Tech Research Institute 2 School of Electrical and Computer Engineering Georgia Institute of

More information

WORKSHOP 17 BOX BEAM WITH TRANSIENT LOAD

WORKSHOP 17 BOX BEAM WITH TRANSIENT LOAD WORKSHOP 17 BOX BEAM WITH TRANSIENT LOAD WS17-1 WS17-2 Workshop Ojtivs Prform mol nlysis for linr ynmi mol. Also, prform linr trnsint nlysis for th mol. Viw th shp of th mol ovr tim. Crt n X vs Y plot

More information

Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics

Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics Moysey Brio & Paul Dostert July 4, 2009 1 / 18 Sparse Matrices In many areas of applied mathematics and modeling, one

More information

Derivation of Mueller matrix from Zemax Tetsu Anan

Derivation of Mueller matrix from Zemax Tetsu Anan Derivation of Mueller matrix from Zemax 2018.03.10 Tetsu Anan Definition of Stokes Q, U, & V Ichimoto et al. 2008 Stenflo 1994 View towards the sun, that is, from the direction of the observer Signs of

More information

Mixed Precision Methods

Mixed Precision Methods Mixed Precision Methods Mixed precision, use the lowest precision required to achieve a given accuracy outcome " Improves runtime, reduce power consumption, lower data movement " Reformulate to find correction

More information

History Rgistr Allotion Exmpl As ol s intrmit o Consir this progrm with six vrils: := + := + := - 1 Us in th originl FORTRAN ompilr (1950 s) Vry ru lg

History Rgistr Allotion Exmpl As ol s intrmit o Consir this progrm with six vrils: := + := + := - 1 Us in th originl FORTRAN ompilr (1950 s) Vry ru lg Th Mmory Hirrhy Avn Compilrs CMPSCI 710 Spring 2003 Highr = smllr, str, losr to CPU A rl sktop mhin (min) Rgistr Allotion Emry Brgr rgistrs 8 intgr, 8 loting-point; 1-yl ltny L1 h 8K t & instrutions; 2-yl

More information

A GPU Sparse Direct Solver for AX=B

A GPU Sparse Direct Solver for AX=B 1 / 25 A GPU Sparse Direct Solver for AX=B Jonathan Hogg, Evgueni Ovtchinnikov, Jennifer Scott* STFC Rutherford Appleton Laboratory 26 March 2014 GPU Technology Conference San Jose, California * Thanks

More information

Minimal Memory Abstractions

Minimal Memory Abstractions Miniml Memory Astrtions (As implemented for BioWre Corp ) Nthn Sturtevnt University of Alert GAMES Group Ferury, 7 Tlk Overview Prt I: Building Astrtions Minimizing memory requirements Performnes mesures

More information

Roofline Model (Will be using this in HW2)

Roofline Model (Will be using this in HW2) Parallel Architecture Announcements HW0 is due Friday night, thank you for those who have already submitted HW1 is due Wednesday night Today Computing operational intensity Dwarves and Motifs Stencil computation

More information

Blocking Optimization Strategies for Sparse Tensor Computation

Blocking Optimization Strategies for Sparse Tensor Computation Blocking Optimization Strategies for Sparse Tensor Computation Jee Choi 1, Xing Liu 1, Shaden Smith 2, and Tyler Simon 3 1 IBM T. J. Watson Research, 2 University of Minnesota, 3 University of Maryland

More information

Locality-Aware Software Throttling for Sparse Matrix Operation on GPUs

Locality-Aware Software Throttling for Sparse Matrix Operation on GPUs Locality-Aware Software Throttling for Sparse Matrix Operation on GPUs Yanhao Chen 1, Ari B. Hayes 1, Chi Zhang 2, Timothy Salmon 1, Eddy Z. Zhang 1 1. Rutgers University 2. University of Pittsburgh Sparse

More information

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications ME964 High Performance Computing for Engineering Applications Outlining Midterm Projects Topic 3: GPU-based FEA Topic 4: GPU Direct Solver for Sparse Linear Algebra March 01, 2011 Dan Negrut, 2011 ME964

More information

Computational Graphics: Lecture 15 SpMSpM and SpMV, or, who cares about complexity when we have a thousand processors?

Computational Graphics: Lecture 15 SpMSpM and SpMV, or, who cares about complexity when we have a thousand processors? Computational Graphics: Lecture 15 SpMSpM and SpMV, or, who cares about complexity when we have a thousand processors? The CVDLab Team Francesco Furiani Tue, April 3, 2014 ROMA TRE UNIVERSITÀ DEGLI STUDI

More information

PARDISO Version Reference Sheet Fortran

PARDISO Version Reference Sheet Fortran PARDISO Version 5.0.0 1 Reference Sheet Fortran CALL PARDISO(PT, MAXFCT, MNUM, MTYPE, PHASE, N, A, IA, JA, 1 PERM, NRHS, IPARM, MSGLVL, B, X, ERROR, DPARM) 1 Please note that this version differs significantly

More information

STUDYING OPENMP WITH VAMPIR

STUDYING OPENMP WITH VAMPIR STUDYING OPENMP WITH VAMPIR Case Studies Sparse Matrix Vector Multiplication Load Imbalances November 15, 2017 Studying OpenMP with Vampir 2 Sparse Matrix Vector Multiplication y 1 a 11 a n1 x 1 = y m

More information

How to Write Fast Numerical Code Spring 2012 Lecture 13. Instructor: Markus Püschel TAs: Georg Ofenbeck & Daniele Spampinato

How to Write Fast Numerical Code Spring 2012 Lecture 13. Instructor: Markus Püschel TAs: Georg Ofenbeck & Daniele Spampinato How to Write Fast Numerical Code Spring 2012 Lecture 13 Instructor: Markus Püschel TAs: Georg Ofenbeck & Daniele Spampinato ATLAS Mflop/s Compile Execute Measure Detect Hardware Parameters L1Size NR MulAdd

More information

LoGA: Low-Overhead GPU Accounting Using Events

LoGA: Low-Overhead GPU Accounting Using Events 10th ACM International Systems and Storage Conference, (KIT) KIT The Research University in the Helmholtz Association www.kit.edu GPU Sharing GPUs increasingly popular in computing Not every application

More information

ALASKA RAILROAD CORPORATION

ALASKA RAILROAD CORPORATION LSK RI LRO ORPORT ION LSK R I LRO ORPORT ION LSK RILRO ORPORTION NGINRING SRVIS P.O. OX 107500, NHORG, LSK 99510-7500 SPN RPLMNT PORTG, K PROJT : SPN RPLMNT TITL PG TL OF ONTNTS 1 14 U:\\eng-projects\ridges\R

More information

Efficient PageRank and SpMV Computation on AMD GPUs

Efficient PageRank and SpMV Computation on AMD GPUs 2010 39th International Conference on Parallel Processing Efficient PageRank and SpMV Computation on AMD GPUs Tianji WU, Bo WANG, Yi SHAN, Feng YAN, Yu WANG and Ningyi XU Department of Electronic Engineering,

More information

MDP NO VALIDO PARA NAVEGACION REAL MAR DEL PLATA, ARGENTINA. .RADAR.MINIMUM.ALTITUDES. Trans level: By ATC GES NDB MDP VILLA GESSELL

MDP NO VALIDO PARA NAVEGACION REAL MAR DEL PLATA, ARGENTINA. .RADAR.MINIMUM.ALTITUDES. Trans level: By ATC GES NDB MDP VILLA GESSELL SZM/MQ MR EL PLT pproach pt Elev 118.75 Spanish ly 73' lt Set: hpa MR EL PLT, RGENTIN.RR.MINIMUM.LTITUES. Trans level: y T Trans alt: ' 36-30 5 0 10 20 30 40 50 37-00 37-30 330^ 300^ 55 NM 230^ 2300 20

More information

Carley Foundry Customer Specifications Index

Carley Foundry Customer Specifications Index 01-SI-S-3-3224 - 0132-2030 10/02 02-5000 N 04-03-002 10 04-03-011 5 05-16 2-17-1989 0500-TS-001 06-03-004 16 06-03-015 3 074-6876-002 11-02 074-8432-119 6-28-02 2-O-01420 -- 3.O 4/1/86 4-74-4502 11 4-O-04188

More information

CSE P 501 Compilers. Register Allocation Hal Perkins Spring UW CSE P 501 Spring 2018 P-1

CSE P 501 Compilers. Register Allocation Hal Perkins Spring UW CSE P 501 Spring 2018 P-1 CSE P 501 Compilrs Rgistr Allotion Hl Prkins Spring 2018 UW CSE P 501 Spring 2018 P-1 Agn Rgistr llotion onstrints Lol mthos Fstr ompil, slowr o, ut goo nough or lots o things (JITs, ) Glol llotion rgistr

More information

Product Update Harsh Environment USB MUSB CONNECTOR SERIES

Product Update Harsh Environment USB MUSB CONNECTOR SERIES Product Update Harsh nvironment US MUS ONNTOR SRIS bout mphenol ommercial Products mphenol s ommercial onnector Products are used in a variety of end user applications including Networking, Telecom, Server

More information

Parallelizing The Matrix Multiplication. 6/10/2013 LONI Parallel Programming Workshop

Parallelizing The Matrix Multiplication. 6/10/2013 LONI Parallel Programming Workshop Parallelizing The Matrix Multiplication 6/10/2013 LONI Parallel Programming Workshop 2013 1 Serial version 6/10/2013 LONI Parallel Programming Workshop 2013 2 X = A md x B dn = C mn d c i,j = a i,k b k,j

More information

Geometric Partitioning for Tomography

Geometric Partitioning for Tomography Geometric Partitioning for Tomography Jan-Willem Buurlage, CWI Amsterdam Rob Bisseling, Utrecht University Joost Batenburg, CWI Amsterdam 2018-09-03, Tokyo, Japan SIAM Conference on Parallel Processing

More information