C PGAS XcalableMP(XMP) Unified Parallel
|
|
- Milo Cannon
- 5 years ago
- Views:
Transcription
1 PGAS XcalableMP Unified Parallel C 1 2 1, 2 1, 2, 3 C PGAS XcalableMP(XMP) Unified Parallel C(UPC) XMP UPC XMP UPC 1 Berkeley UPC GASNet 1. MPI MPI 1 Center for Computational Sciences, University of Tsukuba 2 Graduate School of Systems and Information Engineering, University of Tsukuba 3 RIKEN Advanced Institute for Computational Science Partitioned Global Address SpacePGASPGAS MPI Single Program Multiple Data MPI XcalableMPXMP 1),2) Unified Parallel CUPC 3) PGAS XMP UPC C XMP UPC UPC XMP UPC 2 PGAS XMP UPC 3 4 read/writelaplace Solver NAS Parallel Benchmarks NPB 4) Conjugate GradientCG5 2. Partitioned Global Address Space 2.1 Partitioned Global Address Space PGAS PGAS PGAS 1 c 2011 Information Processing Society of Japan
2 MPI PGAS XMP UPC instance of executionxmp UPC MPI 2.2 XcalableMP XMP XcalableMP XMP Spec WG e-science 5) XMP High Performance FortranHPF 7) XMP HPF HPF XMP XMP C Fortran C XMP XMP Fig. 1 XMP Fig. 2 gmove gmove XMP Fortran :Fig. 2 a2[] N/2 N-1 a1[] 0 N/2-1 for Fig. 3 loop Fig. 3 t 2.3 Unified Parallel C UPC PGAS 1 UPC Consortium 6) UPC #pragma xmp template t(0:n-1) template t index 0 N-1 #pragma xmp nodes p(4) #pragma xmp distribute t(block) onto p node 1 node 2 node 3 node 4 index 0 N/4-1 N/2-1 3*N/4-1 N-1 #pragma xmp align a[i] with t(i) a[] node 1 node 2 node 3 node 4 node 1 node 2 node 3 node 4 index 0 N/4-1 N/2-1 3*N/4-1 N-1 1 (XMP) Fig. 1 Conceptual diagram of template(xmp) #pragma xmp gmove a1[0:n/2-1] = a2[n/2:n-1]; 2 Gmove (XMP) Fig. 2 Example of gmove directive(xmp) #pragma xmp loop on t(i) for(i = 0; i < N; i++){ a[i] = func(i); } 3 Loop (XMP) Fig. 3 Example of loop directive(xmp) shared double a1[100], a2[100]; upc_memcpy(a1, a2, 100*sizeof(double)); 4 UPC (UPC) Fig. 4 How to declare and transfer shared data(upc) shared 1 UPC Fig. 4 Fig double a1[] a2[] a1[] a2[] Block shared [10] double a[100]; shared Block Fig. 4 upc memcpy() a2 100*sizeof(double) a1 upc memget() upc memput() for Fig. 5 upc forall upc forall 4 3 C 3. XcalableMP Unified Parallel C XMP UPC 2 c 2011 Information Processing Society of Japan
3 upc_forall( i=0; i<n; i++; &a[i]){ a[i] = func(i); } 5 upc forall (UPC) Fig. 5 Example of upc forall(upc) #pragma xmp nodes p(2, 2) #pragma xmp template t(0:9, 0:9) #pragma xmp distribute t(block, cyclic) onto p int a[10][10]; #pragma xmp align a[i][j] with t(j, i) 6 2 (XMP) Fig. 6 Example of distribution of two-dimensional array(xmp) 1 Fig. 6 Table 1 Indexes of each process in Fig. 6 Process 1st indexes 2nd indexes of a[][] of a[][] p(1,1) 0, 1, 2, 3, 4 0, 2, 4, 6, 8 p(2,1) 5, 6, 7, 8, 9 0, 2, 4, 6, 8 p(1,2) 0, 1, 2, 3, 4 1, 3, 5, 7, 9 p(2,2) 5, 6, 7, 8, 9 1, 3, 5, 7, 9 Fig. 7 #pragma xmp coarray y[1:3] = x[2:4]:[3]; 7 Co-array (XMP) Example of Co-array Function(XMP) 3.1 CPU XMP UPC cyclicblockblock-cyclic XMP gblock XMP Fig Fig. 6 Table 1 UPC 1 UPC XMP XMP XMP UPC upc alloc() XMP 3.2 XMP UPC XMP UPC UPC strict/relaxed 2 strict relaxed relaxed UPC UPC XMP UPC UPC XMP Fig. 2 UPC XMP 3.3 XMP Co-array Fortran 8) Fig. 7 Fig. 7 3 x[] 2 4 y[] 1 3 Fortran XMP CAF CAF codimension UPC 3 c 2011 Information Processing Society of Japan
4 XMP UPC XMP Omni XMP Compiler 9) version 0.5.3TXMPUPC Lawrence Berkeley National Laboratory UC Berkeley Berkeley UPC 10) version BUPC XMP MPI BUPC MPI GASNet 11),12) T2K Tsukuba System Table 2BUPC GASNet API Infiniband APIibv CPU 8 1 CPU 10 BUPC -O3 param max-inline-insns-single=35000 param inline-unit-growth=10000 param large-function-growth= PGAS read/write double 2 20 Block Cyclic TXMP loop Fig. 3BUPC upc forall Fig. 5 Fig. 8 read/write Native TXMP BUPC gcc read/write Fig. 8 TXMP Block Native Cyclic TXMP 2 Table 2 Specifications of each node on experimental environment CPU AMD Opteron Quad-Core 8000 series 2.3GHz (4 sockets) Memory DDR2 667MHz 32GB Network Infiniband DDR (4 rails) 8GB/s OS Linux Compiler gcc MPI mvapich2-1.7a 8 Fig. 8 Access speed in global region Block XMP HPF TXMP 13) BUPC Block Native 3 Cyclic 2 Block Cyclic Block shared Block 14) UPC shared XMP XMP UPC 4.3 Laplace Solver Laplace Solver 4 TXMP Laplace Solver Fig. 9 BUPC Fig Block XMP shadow reflect BUPC TXMP 4 c 2011 Information Processing Society of Japan
5 9 Laplace solver (TXMP) Fig. 9 Source of laplace solver (TXMP) 10 Fig. 10 Laplace solver Result of laplace solver 11 Privatization (UPC) Fig. 11 Sample code of privatization(upc) 12 Laplace solver (BUPC) Fig. 12 Source of laplace solver (BUPC) 4 Fig. 12 THREADS MYTHREAD MPI 0 Laplace Solver 2 UPC 1 SIZE 512 TIMES 100 Fig. 10 Fig. 10 TXMP BUPC 4.2 Block TXMP BUPC 4.4 NAS Parallel Benchmark Conjugate Gradient CG UPC UPC NAS Parallel BenchmarksUPC-NPB 15) XMP UPC w[] UPC-NPB CG (1) (2)(1) (3)(2) 3 (2) Privatization Fig. 11 Privatization Fig SIZE w[] w[] 1 3 w[] SIZE/THREADS w ptr w[] w ptr 16) MPI (1) BUPC-1(2) BUPC-2(3) BUPC-3 CG TXMP Fig. 13 BUPC-1 Fig c 2011 Information Processing Society of Japan
6 13 Conjugate gradient (TXMP) Fig. 13 Source of conjugate gradient (TXMP) 14 Conjugate gradient Fig. 14 Result of conjugate gradient 15 Conjugate gradient (BUPC) Fig. 15 Source of conjugate gradient (BUPC) Table 3 3 CPU CPU time of each implementation Cores TXMP BUPC-1 BUPC Table 4 4 Comm. time of each implementation Cores TXMP BUPC-1 BUPC CG 2 PROC COLS PROCS ROW 2 for w[]bupc-1 w1[] 2 for w[] q[] CLASS C NA CG Fig. 14 BUPC-2 BUPC-3 BUPC-3 2 MPI-CG BUPC-2 TXMP Table 3 CPU Table 4 Table 4 MPI-CG MPI Table 3 Table 4 CPU 1 TXMP BUPC Fig. 13 Fig. 15 TXMP BUPC TXMP w[] BUPC q[] 2, 8, 32, 128 TXMP BUPC 6 c 2011 Information Processing Society of Japan
7 5 XcalableMP Unified Parallel C Table 5 Language features of XcalableMP and Unified Parallel C XcalableMP Unified Parallel C upc forall C, Fortran C 5. C PGAS XMP UPC Table 5 XMP XMP Laplace Solver CG UPC XMP UPC XMP UPC e- XcalableMP 1) XcalableMP Specification DRAFT 0.7, XcalableMP Specification Working Group, ),,. XcalableMP,, Vol.3, No.3, pp , ) UPC Consortium, UPC Language Specifications V1.2, Technical Report LBNL , Berkeley National Lab, spec 1. 2.pdf 4) Bailey, D.H. and et al.: THE NAS PARALLEL BENCHMARKS, Technical Report NAS , Nasa Ames Research Center ) IT e-, go.jp/bmenu/boshu/detail/ /002.htm 6) Unified Parallel C at George Washington University, 7) C.H. Koelbel, D.B. Loverman, R. Shreiber, GL. Steele Jr., M.E. Zosel. The High Performance Fortran Handbook. MIT Press, ) R. Numwich and J. Reid. Co-Array Fortran for parallel programming. Technical Report RAL-TR , Rutherford Appleton Laboratory, ) 10) 11) Christian Bell, Dan Bonachea, Rajesh Nishtala, Katherine Yelick. Optimizing bandwidth limited problems using one-sided communication and overlap. In The 20th Int l Parallel and Distributed Processing Symposium (IPDPS), ) 13) fhpf J.JSSAC, Vol. 11, No. 3,4, pp , ) Wei-Yu Chen, Dan Bonachea, Jason Duell, Parry Husbands, Costin Iancu, Katherine Yelick. A Performance Analysis of the Berkeley UPC Compiler, ICS 03 Proceedings of the 17th annual international conference on Supercomputing, ) 16) El-Ghazawi, T., Chauvin, S. UPC benchmarking issues, Parallel Processing, International Conference, pp , c 2011 Information Processing Society of Japan
HPC Challenge Awards 2010 Class2 XcalableMP Submission
HPC Challenge Awards 2010 Class2 XcalableMP Submission Jinpil Lee, Masahiro Nakao, Mitsuhisa Sato University of Tsukuba Submission Overview XcalableMP Language and model, proposed by XMP spec WG Fortran
More informationAn Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters
An Extension of XcalableMP PGAS Lanaguage for Multi-node Clusters Jinpil Lee, Minh Tuan Tran, Tetsuya Odajima, Taisuke Boku and Mitsuhisa Sato University of Tsukuba 1 Presentation Overview l Introduction
More informationOmni Compiler and XcodeML: An Infrastructure for Source-to- Source Transformation
http://omni compiler.org/ Omni Compiler and XcodeML: An Infrastructure for Source-to- Source Transformation MS03 Code Generation Techniques for HPC Earth Science Applications Mitsuhisa Sato (RIKEN / Advanced
More informationMasahiro Nakao, Hitoshi Murai, Takenori Shimosaka, Mitsuhisa Sato
Masahiro Nakao, Hitoshi Murai, Takenori Shimosaka, Mitsuhisa Sato Center for Computational Sciences, University of Tsukuba, Japan RIKEN Advanced Institute for Computational Science, Japan 2 XMP/C int array[16];
More informationExploring XcalableMP. Shun Liang. August 24, 2012
Exploring XcalableMP Shun Liang August 24, 2012 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2012 Abstract This project has implemented synthetic and application
More informationXcalableMP Implementation and
XcalableMP Implementation and Performance of NAS Parallel Benchmarks Mitsuhisa Sato Masahiro Nakao, Jinpil Lee and Taisuke Boku University of Tsukuba, Japan What s XcalableMP? XcalableMP (XMP for short)
More informationint a[100]; #pragma xmp nodes p[*] #pragma xmp template t[100] #pragma xmp distribute t[block] onto p #pragma xmp align a[i] with t[i]
2 3 4 int a[100]; #pragma xmp nodes p[*] #pragma xmp template t[100] #pragma xmp distribute t[block] onto p #pragma xmp align a[i] with t[i] integer :: a(100)!$xmp nodes p(*)!$xmp template t(100)!$xmp
More informationGPU GPU CPU. Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3
/CPU,a),2,2 2,2 Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3 XMP XMP-dev CPU XMP-dev/StarPU XMP-dev XMP CPU StarPU CPU /CPU XMP-dev/StarPU N /CPU CPU. Graphics Processing Unit GP General-Purpose
More informationPerformance Comparison between Two Programming Models of XcalableMP
Performance Comparison between Two Programming Models of XcalableMP H. Sakagami Fund. Phys. Sim. Div., National Institute for Fusion Science XcalableMP specification Working Group (XMP-WG) Dilemma in Parallel
More informationUnified Runtime for PGAS and MPI over OFED
Unified Runtime for PGAS and MPI over OFED D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University, USA Outline Introduction
More informationC++ T2K NPB. An Implementation and NPB Evaluation of C++ Task Allocation Library (tpdplib) on T2K Open Supercomputer
C++ tpdplib T2K NPB 1 1 GPGPU C++ tpdplib T2K An Implementation and NPB Evaluation of C++ Task Allocation Library (tpdplib) on T2K Open Supercomputer Takeo Yamasaki 1 and Nakayama Masaya 1 Modern computing
More informationUnifying UPC and MPI Runtimes: Experience with MVAPICH
Unifying UPC and MPI Runtimes: Experience with MVAPICH Jithin Jose Miao Luo Sayantan Sur D. K. Panda Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
More informationWhat is Stencil Computation?
Model Checking Stencil Computations Written in a Partitioned Global Address Space Language Tatsuya Abe, Toshiyuki Maeda, and Mitsuhisa Sato RIKEN AICS HIPS 13 May 20, 2013 What is Stencil Computation?
More informationIPSJ SIG Technical Report Vol.2014-HPC-145 No /7/29 XcalableMP FFT 1 1 1,2 HPC PGAS XcalableMP XcalableMP G-FFT 90.6% 186.6TFLOPS XMP MPI
XcalableMP FFT, HPC PGAS XcalableMP XcalableMP 89 G-FFT 9.6% 86.6TFLOPS XMP MPI. Fourier (FFT) MPI [] Partitioned Global Address Space (PGAS) FFT PGAS PGAS XcalableMP(XMP)[] C Fortran XMP HPC [] Global-FFT
More informationDEVELOPING AN OPTIMIZED UPC COMPILER FOR FUTURE ARCHITECTURES
DEVELOPING AN OPTIMIZED UPC COMPILER FOR FUTURE ARCHITECTURES Tarek El-Ghazawi, François Cantonnet, Yiyi Yao Department of Electrical and Computer Engineering The George Washington University tarek@gwu.edu
More informationBerkeley UPC SESSION 3: Library Extensions
SESSION 3: Library Extensions Christian Bell, Wei Chen, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Rajesh Nishtala, Mike Welcome, Kathy Yelick U.C. Berkeley / LBNL 16 Library Extensions
More informationImplementation and Evaluation of Coarray Fortran Translator Based on OMNI XcalableMP. October 29, 2015 Hidetoshi Iwashita, RIKEN AICS
Implementation and Evaluation of Coarray Fortran Translator Based on OMNI XcalableMP October 29, 2015 Hidetoshi Iwashita, RIKEN AICS Background XMP Contains Coarray Features XcalableMP (XMP) A PGAS language,
More informationAffine Loop Optimization using Modulo Unrolling in CHAPEL
Affine Loop Optimization using Modulo Unrolling in CHAPEL Aroon Sharma, Joshua Koehler, Rajeev Barua LTS POC: Michael Ferguson 2 Overall Goal Improve the runtime of certain types of parallel computers
More informationMPI_Send(a,..., MPI_COMM_WORLD); MPI_Recv(a,..., MPI_COMM_WORLD, &status);
$ $ 2 global void kernel(int a[max], int llimit, int ulimit) {... } : int main(int argc, char *argv[]){ MPI_Int(&argc, &argc); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size);
More informationA Local-View Array Library for Partitioned Global Address Space C++ Programs
Lawrence Berkeley National Laboratory A Local-View Array Library for Partitioned Global Address Space C++ Programs Amir Kamil, Yili Zheng, and Katherine Yelick Lawrence Berkeley Lab Berkeley, CA, USA June
More informationPerformance Evaluation for Omni XcalableMP Compiler on Many-core Cluster System based on Knights Landing
ABSTRACT Masahiro Nakao RIKEN Advanced Institute for Computational Science Hyogo, Japan masahiro.nakao@riken.jp Taisuke Boku Center for Computational Sciences University of Tsukuba Ibaraki, Japan To reduce
More informationAutomatic Nonblocking Communication for Partitioned Global Address Space Programs
Automatic Nonblocking Communication for Partitioned Global Address Space Programs Wei-Yu Chen 1,2 wychen@cs.berkeley.edu Costin Iancu 2 cciancu@lbl.gov Dan Bonachea 1,2 bonachea@cs.berkeley.edu Katherine
More informationPartitioned Global Address Space (PGAS) Model. Bin Bao
Partitioned Global Address Space (PGAS) Model Bin Bao Contents PGAS model introduction Unified Parallel C (UPC) introduction Data Distribution, Worksharing and Exploiting Locality Synchronization and Memory
More informationUnified Parallel C (UPC) Katherine Yelick NERSC Director, Lawrence Berkeley National Laboratory EECS Professor, UC Berkeley
Unified Parallel C (UPC) Katherine Yelick NERSC Director, Lawrence Berkeley National Laboratory EECS Professor, UC Berkeley Berkeley UPC Team Current UPC Team Filip Blagojevic Dan Bonachea Paul Hargrove
More informationMPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); double a[100]; #pragma acc data copy(a) { #pragma acc parallel loop for(i=0;i<100;i++)
2 MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); double a[100]; #pragma acc data copy(a) { #pragma acc parallel loop for(i=0;i
More informationPortable, MPI-Interoperable! Coarray Fortran
Portable, MPI-Interoperable! Coarray Fortran Chaoran Yang, 1 Wesley Bland, 2! John Mellor-Crummey, 1 Pavan Balaji 2 1 Department of Computer Science! Rice University! Houston, TX 2 Mathematics and Computer
More informationArchitectural Trends and Programming Model Strategies for Large-Scale Machines
Architectural Trends and Programming Model Strategies for Large-Scale Machines Katherine Yelick U.C. Berkeley and Lawrence Berkeley National Lab http://titanium.cs.berkeley.edu http://upc.lbl.gov 1 Kathy
More informationA Characterization of Shared Data Access Patterns in UPC Programs
A Characterization of Shared Data Access Patterns in UPC Programs Christopher Barton 1, Călin Caşcaval, and José Nelson Amaral 1 1 Department of Computing Science, University of Alberta, Edmonton, Canada
More informationYasuo Okabe. Hitoshi Murai. 1. Introduction. 2. Evaluation. Elapsed Time (sec) Number of Processors
Performance Evaluation of Large-scale Parallel Simulation Codes and Designing New Language Features on the (High Performance Fortran) Data-Parallel Programming Environment Project Representative Yasuo
More informationEnforcing Textual Alignment of
Parallel Hardware Parallel Applications IT industry (Silicon Valley) Parallel Software Users Enforcing Textual Alignment of Collectives using Dynamic Checks and Katherine Yelick UC Berkeley Parallel Computing
More informationLLVM-based Communication Optimizations for PGAS Programs
LLVM-based Communication Optimizations for PGAS Programs nd Workshop on the LLVM Compiler Infrastructure in HPC @ SC15 Akihiro Hayashi (Rice University) Jisheng Zhao (Rice University) Michael Ferguson
More informationSoftware Distributed Shared Memory with High Bandwidth Network: Production and Evaluation
,,.,, InfiniBand PCI Express,,,. Software Distributed Shared Memory with High Bandwidth Network: Production and Evaluation Akira Nishida, The recent development of commodity hardware technologies makes
More informationLecture 32: Partitioned Global Address Space (PGAS) programming models
COMP 322: Fundamentals of Parallel Programming Lecture 32: Partitioned Global Address Space (PGAS) programming models Zoran Budimlić and Mack Joyner {zoran, mjoyner}@rice.edu http://comp322.rice.edu COMP
More informationPerformance Analysis Framework for GASNet Middleware, Tools, and Applications
Performance Analysis Framework for GASNet Middleware, Tools, and Applications Prashanth Prakash Max Billingsley III Alan D. George Vikas Aggarwal High-performance Computing and Simulation (HCS) Research
More informationA Characterization of Shared Data Access Patterns in UPC Programs
IBM T.J. Watson Research Center A Characterization of Shared Data Access Patterns in UPC Programs Christopher Barton, Calin Cascaval, Jose Nelson Amaral LCPC `06 November 2, 2006 Outline Motivation Overview
More informationMulti-Threaded UPC Runtime for GPU to GPU communication over InfiniBand
Multi-Threaded UPC Runtime for GPU to GPU communication over InfiniBand Miao Luo, Hao Wang, & D. K. Panda Network- Based Compu2ng Laboratory Department of Computer Science and Engineering The Ohio State
More informationComputer Science Technical Report
Computer Science Technical Report A Performance Model for Unified Parallel C Zhang Zhang Michigan Technological University Computer Science Technical Report CS-TR-7-4 June, 27 Department of Computer Science
More informationRuntime Address Space Computation for SDSM Systems
Runtime Address Space Computation for SDSM Systems Jairo Balart Outline Introduction Inspector/executor model Implementation Evaluation Conclusions & future work 2 Outline Introduction Inspector/executor
More informationComparative Performance Analysis of RDMA-Enhanced Ethernet
Comparative Performance Analysis of RDMA-Enhanced Ethernet Casey B. Reardon and Alan D. George HCS Research Laboratory University of Florida Gainesville, FL July 24, 2005 Clement T. Cole Ammasso Inc. Boston,
More informationPGAS Languages (Par//oned Global Address Space) Marc Snir
PGAS Languages (Par//oned Global Address Space) Marc Snir Goal Global address space is more convenient to users: OpenMP programs are simpler than MPI programs Languages such as OpenMP do not provide mechanisms
More informationOptimization of Lattice QCD with CG and multi-shift CG on Intel Xeon Phi Coprocessor
Optimization of Lattice QCD with CG and multi-shift CG on Intel Xeon Phi Coprocessor Intel K. K. E-mail: hirokazu.kobayashi@intel.com Yoshifumi Nakamura RIKEN AICS E-mail: nakamura@riken.jp Shinji Takeda
More informationLinkage of XcalableMP and Python languages for high productivity on HPC cluster system
Linkage of XcalableMP and Python languages for high productivity on HPC cluster system Masahiro Nakao (RIKEN Center for Computational Science) 6th XMP Workshop@University of Tsukuba, Japan Background XcalableMP
More informationA C compiler for Large Data Sequential Processing using Remote Memory
A C compiler for Large Data Sequential Processing using Remote Memory Shiyo Yoshimura, Hiroko Midorikawa Graduate School of Science and Technology, Seikei University, Tokyo, Japan E-mail:dm106231@cc.seikei.ac.jp,
More informationEfficient Data Race Detection for Unified Parallel C
P A R A L L E L C O M P U T I N G L A B O R A T O R Y Efficient Data Race Detection for Unified Parallel C ParLab Winter Retreat 1/14/2011" Costin Iancu, LBL" Nick Jalbert, UC Berkeley" Chang-Seo Park,
More informationUnified Parallel C (UPC)
Unified Parallel C (UPC) Vivek Sarkar Department of Computer Science Rice University vsarkar@cs.rice.edu COMP 422 Lecture 21 March 27, 2008 Acknowledgments Supercomputing 2007 tutorial on Programming using
More informationOverlapping Computation and Communication for Advection on Hybrid Parallel Computers
Overlapping Computation and Communication for Advection on Hybrid Parallel Computers James B White III (Trey) trey@ucar.edu National Center for Atmospheric Research Jack Dongarra dongarra@eecs.utk.edu
More informationCluster-enabled OpenMP: An OpenMP compiler for the SCASH software distributed shared memory system
123 Cluster-enabled OpenMP: An OpenMP compiler for the SCASH software distributed shared memory system Mitsuhisa Sato a, Hiroshi Harada a, Atsushi Hasegawa b and Yutaka Ishikawa a a Real World Computing
More informationPorting GASNet to Portals: Partitioned Global Address Space (PGAS) Language Support for the Cray XT
Porting GASNet to Portals: Partitioned Global Address Space (PGAS) Language Support for the Cray XT Paul Hargrove Dan Bonachea, Michael Welcome, Katherine Yelick UPC Review. July 22, 2009. What is GASNet?
More informationPerformance Evaluation of InfiniBand with PCI Express
Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Amith Mamidala Abhinav Vishnu Dhabaleswar K Panda Department of Computer and Science and Engineering The Ohio State University Columbus,
More informationUPC: A Portable High Performance Dialect of C
UPC: A Portable High Performance Dialect of C Kathy Yelick Christian Bell, Dan Bonachea, Wei Chen, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Wei Tu, Mike Welcome Parallelism on the Rise
More informationUsing a Cluster as a Memory Resource: A Fast and Large Virtual Memory on MPI
Using a Cluster as a Memory Resource: A Fast and Large Virtual Memory on MPI DLM: Distributed Large Memory System IEEE Cluster 2009, New Orleans, Aug.31- Sep.4 Hiroko Midorikawa, Kazuhiro Saito Seikei
More informationUPC-CHECK: A scalable tool for detecting run-time errors in Unified Parallel C
myjournal manuscript No. (will be inserted by the editor) UPC-CHECK: A scalable tool for detecting run-time errors in Unified Parallel C James Coyle Indranil Roy Marina Kraeva Glenn R. Luecke Received:
More informationEvaluating the Impact of Programming Language Features on the Performance of Parallel Applications on Cluster Architectures
Evaluating the Impact of Programming Language Features on the Performance of Parallel Applications on Cluster Architectures Konstantin Berlin 1,JunHuan 2, Mary Jacob 3, Garima Kochhar 3,JanPrins 2, Bill
More informationOn the Performance and Energy Efficiency of the PGAS Programming Model on Multicore Architectures
On the and Energy Efficiency of the PGAS Programming Model on Multicore Architectures Jérémie Lagravière & Johannes Langguth Simula Research Laboratory NO-1364 Fornebu, Norway jeremie@simula.no langguth@simula.no
More informationProductivity and Performance Using Partitioned Global Address Space Languages
Productivity and Performance Using Partitioned Global Address Space Languages Katherine Yelick 1,2, Dan Bonachea 1,2, Wei-Yu Chen 1,2, Phillip Colella 2, Kaushik Datta 1,2, Jason Duell 1,2, Susan L. Graham
More informationOpenMP on the FDSM software distributed shared memory. Hiroya Matsuba Yutaka Ishikawa
OpenMP on the FDSM software distributed shared memory Hiroya Matsuba Yutaka Ishikawa 1 2 Software DSM OpenMP programs usually run on the shared memory computers OpenMP programs work on the distributed
More informationCMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)
CMSC 714 Lecture 4 OpenMP and UPC Chau-Wen Tseng (from A. Sussman) Programming Model Overview Message passing (MPI, PVM) Separate address spaces Explicit messages to access shared data Send / receive (MPI
More informationParallel Programming Languages. HPC Fall 2010 Prof. Robert van Engelen
Parallel Programming Languages HPC Fall 2010 Prof. Robert van Engelen Overview Partitioned Global Address Space (PGAS) A selection of PGAS parallel programming languages CAF UPC Further reading HPC Fall
More informationPortable, MPI-Interoperable! Coarray Fortran
Portable, MPI-Interoperable! Coarray Fortran Chaoran Yang, 1 Wesley Bland, 2! John Mellor-Crummey, 1 Pavan Balaji 2 1 Department of Computer Science! Rice University! Houston, TX 2 Mathematics and Computer
More informationOpenSHMEM Performance and Potential: A NPB Experimental Study
OpenSHMEM Performance and Potential: A NPB Experimental Study Swaroop Pophale, Ramachandra Nanjegowda, Tony Curtis, Barbara Chapman University of Houston Houston, Texas 77004 Haoqiang Jin NASA Ames Research
More informationPerformance Evaluation of InfiniBand with PCI Express
Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Server Technology Group IBM T. J. Watson Research Center Yorktown Heights, NY 1598 jl@us.ibm.com Amith Mamidala, Abhinav Vishnu, and Dhabaleswar
More informationMiwako TSUJI XcalableMP
Miwako TSUJI AICS 2014.10.24 2 XcalableMP 2010.09 2014.03 2013.10.25 AKIHABARA FP2C (Framework for Post-Petascale Computing) YML + XMP(-dev) + StarPU integrated developed in Japan and in France Experimental
More informationPage Replacement Algorithm using Swap-in History for Remote Memory Paging
Page Replacement Algorithm using Swap-in History for Remote Memory Paging Kazuhiro SAITO Hiroko MIDORIKAWA and Munenori KAI Graduate School of Engineering, Seikei University, 3-3-, Kichijoujikita-machi,
More informationProgramming Environment Research Team
Chapter 2 Programming Environment Research Team 2.1 Members Mitsuhisa Sato (Team Leader) Hitoshi Murai (Research Scientist) Miwako Tsuji (Research Scientist) Masahiro Nakao (Research Scientist) Jinpil
More informationParallel Programming Features in the Fortran Standard. Steve Lionel 12/4/2012
Parallel Programming Features in the Fortran Standard Steve Lionel 12/4/2012 Agenda Overview of popular parallelism methodologies FORALL a look back DO CONCURRENT Coarrays Fortran 2015 Q+A 12/5/2012 2
More informationAPPLICATION OF PARALLEL ARRAYS FOR SEMIAUTOMATIC PARALLELIZATION OF FLOW IN POROUS MEDIA PROBLEM SOLVER
Mathematical Modelling and Analysis 2005. Pages 171 177 Proceedings of the 10 th International Conference MMA2005&CMAM2, Trakai c 2005 Technika ISBN 9986-05-924-0 APPLICATION OF PARALLEL ARRAYS FOR SEMIAUTOMATIC
More informationParallel Programming with Coarray Fortran
Parallel Programming with Coarray Fortran SC10 Tutorial, November 15 th 2010 David Henty, Alan Simpson (EPCC) Harvey Richardson, Bill Long, Nathan Wichmann (Cray) Tutorial Overview The Fortran Programming
More informationSteve Deitz Chapel project, Cray Inc.
Parallel Programming in Chapel LACSI 2006 October 18 th, 2006 Steve Deitz Chapel project, Cray Inc. Why is Parallel Programming Hard? Partitioning of data across processors Partitioning of tasks across
More informationParallel Languages: Past, Present and Future
Parallel Languages: Past, Present and Future Katherine Yelick U.C. Berkeley and Lawrence Berkeley National Lab 1 Kathy Yelick Internal Outline Two components: control and data (communication/sharing) One
More informationROSE-CIRM Detecting C-Style Errors in UPC Code
ROSE-CIRM Detecting C-Style Errors in UPC Code Peter Pirkelbauer 1 Chunhuah Liao 1 Thomas Panas 2 Daniel Quinlan 1 1 2 Microsoft Parallel Data Warehouse This work was funded by the Department of Defense
More informationHierarchical Computation in the SPMD Programming Model
Hierarchical Computation in the SPMD Programming Model Amir Kamil Katherine Yelick Computer Science Division, University of California, Berkeley {kamil,yelick@cs.berkeley.edu Abstract. Large-scale parallel
More informationComparing One-Sided Communication with MPI, UPC and SHMEM
Comparing One-Sided Communication with MPI, UPC and SHMEM EPCC University of Edinburgh Dr Chris Maynard Application Consultant, EPCC c.maynard@ed.ac.uk +44 131 650 5077 The Future ain t what it used to
More informationParallel Programming of High-Performance Reconfigurable Computing Systems with Unified Parallel C
Parallel Programming of High-Performance Reconfigurable Computing Systems with Unified Parallel C Tarek El-Ghazawi, Olivier Serres, Samy Bahra, Miaoqing Huang and Esam El-Araby Department of Electrical
More informationADVANCED PGAS CENTRIC USAGE OF THE OPENFABRICS INTERFACE
13 th ANNUAL WORKSHOP 2017 ADVANCED PGAS CENTRIC USAGE OF THE OPENFABRICS INTERFACE Erik Paulson, Kayla Seager, Sayantan Sur, James Dinan, Dave Ozog: Intel Corporation Collaborators: Howard Pritchard:
More informationPost-Petascale Computing. Mitsuhisa Sato
Challenges on Programming Models and Languages for Post-Petascale Computing -- from Japanese NGS project "The K computer" to Exascale computing -- Mitsuhisa Sato Center for Computational Sciences (CCS),
More informationSystem Software Stack for the Next Generation High-Performance Computers
1,2 2 Gerofi Balazs 1 3 2 4 4 5 6 7 7 PC CPU PC OS MPI I/O System Software Stack for the Next Generation High-Performance Computers Yutaka Ishikawa 1,2 Atsushi Hori 2 Gerofi Balazs 1 Masamichi Takagi 3
More informationImplementation and Performance Analysis of Non-Blocking Collective Operations for MPI
Implementation and Performance Analysis of Non-Blocking Collective Operations for MPI T. Hoefler 1,2, A. Lumsdaine 1 and W. Rehm 2 1 Open Systems Lab 2 Computer Architecture Group Indiana University Technical
More informationThe Performance Analysis of Portable Parallel Programming Interface MpC for SDSM and pthread
The Performance Analysis of Portable Parallel Programming Interface MpC for SDSM and pthread Workshop DSM2005 CCGrig2005 Seikei University Tokyo, Japan midori@st.seikei.ac.jp Hiroko Midorikawa 1 Outline
More informationJ. Blair Perot. Ali Khajeh-Saeed. Software Engineer CD-adapco. Mechanical Engineering UMASS, Amherst
Ali Khajeh-Saeed Software Engineer CD-adapco J. Blair Perot Mechanical Engineering UMASS, Amherst Supercomputers Optimization Stream Benchmark Stag++ (3D Incompressible Flow Code) Matrix Multiply Function
More informationHigh-Performance Key-Value Store on OpenSHMEM
High-Performance Key-Value Store on OpenSHMEM Huansong Fu*, Manjunath Gorentla Venkata, Ahana Roy Choudhury*, Neena Imam, Weikuan Yu* *Florida State University Oak Ridge National Laboratory Outline Background
More informationEvaluating the Portability of UPC to the Cell Broadband Engine
Evaluating the Portability of UPC to the Cell Broadband Engine Dipl. Inform. Ruben Niederhagen JSC Cell Meeting CHAIR FOR OPERATING SYSTEMS Outline Introduction UPC Cell UPC on Cell Mapping Compiler and
More informationOpenMPI OpenMP like tool for easy programming in MPI
OpenMPI OpenMP like tool for easy programming in MPI Taisuke Boku 1, Mitsuhisa Sato 1, Masazumi Matsubara 2, Daisuke Takahashi 1 1 Graduate School of Systems and Information Engineering, University of
More informationLeveraging C++ Meta-programming Capabilities to Simplify the Message Passing Programming Model
Leveraging C++ Meta-programming Capabilities to Simplify the Message Passing Programming Model Simone Pellegrini, Radu Prodan, and Thomas Fahringer University of Innsbruck Distributed and Parallel Systems
More informationScaleUPC: A UPC Compiler for Multi-Core Systems
ScaleUPC: A UPC Compiler for Multi-Core Systems Weiming Zhao Department of Computer Science Michigan Technological University wezhao@mtu.edu Zhenlin Wang Department of Computer Science Michigan Technological
More informationCompilation Techniques for Partitioned Global Address Space Languages
Compilation Techniques for Partitioned Global Address Space Languages Katherine Yelick U.C. Berkeley and Lawrence Berkeley National Lab http://titanium.cs.berkeley.edu http://upc.lbl.gov 1 Kathy Yelick
More informationCo-array Fortran Performance and Potential: an NPB Experimental Study. Department of Computer Science Rice University
Co-array Fortran Performance and Potential: an NPB Experimental Study Cristian Coarfa Jason Lee Eckhardt Yuri Dotsenko John Mellor-Crummey Department of Computer Science Rice University Parallel Programming
More informationCompilation Techniques for Partitioned Global Address Space Languages
Compilation Techniques for Partitioned Global Address Space Languages Katherine Yelick U.C. Berkeley and Lawrence Berkeley National Lab http://titanium.cs.berkeley.edu http://upc.lbl.gov 1 Kathy Yelick
More informationHierarchical Pointer Analysis
for Distributed Programs Amir Kamil and Katherine Yelick U.C. Berkeley August 23, 2007 1 Amir Kamil Background 2 Amir Kamil Hierarchical Machines Parallel machines often have hierarchical structure 4 A
More informationCIRM - Dynamic Error Detection
CIRM - Dynamic Error Detection Peter Pirkelbauer Center for Applied Scientific Computing (CASC) Lawrence Livermore National Laboratory This work was funded by the Department of Defense and used elements
More informationRuntime Correctness Checking for Emerging Programming Paradigms
(protze@itc.rwth-aachen.de), Christian Terboven, Matthias S. Müller, Serge Petiton, Nahid Emad, Hitoshi Murai and Taisuke Boku RWTH Aachen University, Germany University of Tsukuba / RIKEN, Japan Maison
More informationA Comparison of Unified Parallel C, Titanium and Co-Array Fortran. The purpose of this paper is to compare Unified Parallel C, Titanium and Co-
Shaun Lindsay CS425 A Comparison of Unified Parallel C, Titanium and Co-Array Fortran The purpose of this paper is to compare Unified Parallel C, Titanium and Co- Array Fortran s methods of parallelism
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming Section 5. Victor Gergel, Professor, D.Sc. Lobachevsky State University of Nizhni Novgorod (UNN) Contents (CAF) Approaches to parallel programs development Parallel
More informationUPC Performance Evaluation on a Multicore System
Performance Evaluation on a Multicore System Damián A. Mallón, J. Carlos Mouriño, Andrés Gómez Galicia Supercomputing Center Santiago de Compostela, Spain {dalvarez,jmourino,agomez}@cesga.es Guillermo
More informationAmazon Web Services: Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud
Amazon Web Services: Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud Summarized by: Michael Riera 9/17/2011 University of Central Florida CDA5532 Agenda
More informationPortable SHMEMCache: A High-Performance Key-Value Store on OpenSHMEM and MPI
Portable SHMEMCache: A High-Performance Key-Value Store on OpenSHMEM and MPI Huansong Fu*, Manjunath Gorentla Venkata, Neena Imam, Weikuan Yu* *Florida State University Oak Ridge National Laboratory Outline
More informationHigh Performance Fortran. James Curry
High Performance Fortran James Curry Wikipedia! New Fortran statements, such as FORALL, and the ability to create PURE (side effect free) procedures Compiler directives for recommended distributions of
More informationPerformance Analysis, Modeling and Tuning at Scale. Katherine Yelick
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Performance Analysis, Modeling and Tuning at Scale Katherine Yelick Berkeley Institute for Performance Studies Lawrence Berkeley National Lab and
More informationScaling with PGAS Languages
Scaling with PGAS Languages Panel Presentation at OFA Developers Workshop (2013) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More informationProceedings of the GCC Developers Summit
Reprinted from the Proceedings of the GCC Developers Summit June 17th 19th, 2008 Ottawa, Ontario Canada Conference Organizers Andrew J. Hutton, Steamballoon, Inc., Linux Symposium, Thin Lines Mountaineering
More informationPerformance without Pain = Productivity Data Layout and Collective Communication in UPC
Performance without Pain = Productivity Data Layout and Collective Communication in UPC By Rajesh Nishtala (UC Berkeley), George Almási (IBM Watson Research Center), Călin Caşcaval (IBM Watson Research
More information