page migration Implementation and Evaluation of Dynamic Load Balancing Using Runtime Performance Monitoring on Omni/SCASH
|
|
- Irma Black
- 5 years ago
- Views:
Transcription
1 Omni/SCASH heterogeneity Omni/SCASH page migration Implementation and Evaluation of Dynamic Load Balancing Using Runtime Performance Monitoring on Omni/SCASH Yoshiaki Sakae, 1 Satoshi Matsuoka, 2 Mitsuhisa Sato 3 and Hiroshi Harada 4 In the commodity cluster environment, there may be performance heterogeneity between nodes due to several reasons, from which the application programmer will suffer as load imbalances. To overcome these problems, some dynamic load balancing mechanisms are needed. In this paper, we report our ongoing work on dynamic load balancing extension to Omni/SCASH. Using our dynamic load balancing mechanisms, we expect that programmers can have load imbalances adjusted automatically by the runtime system without explicit definition of data and task placements in a commodity cluster environment with possibly heterogeneous performance nodes. The results of evaluation indicates that, our loop re-partitioning scheme which is one of the dynamic load balancing extension works well, and also it is important to combine loop re-partition with dynamic page migration. 1. HPC SMP 9) 1 Tokyo Institute of Technology 2 / JST Tokyo Institute of Technology / JST 3 Tsukuba University 4 Hewlett-Packard Japan,Ltd. heterogeneity heterogeneity OpenMP 1
2 SDSM SCASH OpenMP Omni/SCASH 7) Omni/SCASH SCASH page fault counting Omni OpenMP Omni OpenMP OpenMP C translator C-front F-front C Fortran Xobject Xobject AST (Abstract Syntax Tree) Java Exc Java tools Exc Java tools Xobject Java OpenMP C Exc Java tools C 2.2 SCASH SCASH 3) PM 8) OS Software Distributed Shared Memory System SCASH Eager Release Consistency (ERC) invalidate update SCASH home base home base home base home home 2.3 Translation of OpenMP programs to SCASH OpenMP SCASH OpenMP SCASH shemem OpenMP SCASH OpenMP parallel fork-join parallel fork Omni/SCASH nested parallelism SCASH OpenMP OpenMP e.g. SCASH 3. 1 : OpenMP dynamic guided chunk queue 2
3 Example: Num Proc = 3, Perf Ratio = 3:1:2 schedule(profiled) #pragma parallel omp for schedule(profiled) for (i = 0; i < N; i++) { LOOP_BODY; 1 3n n 2n schedule(profiled, 2) schedule(profiled, 2, 3, 1) : executed on proc 0 : executed on proc 1 1 : executed on proc 2 : profiling loop Example of profiled scheduling profiled chunk profiled OpenMP schedule clause schedule(profiled[, chunk_size[, eval_size[, eval_skip]]]) profiled eval skip eval size chunk size eval skip, eval size, chunk size eval skip 0 eval size eval size = 1 chunk size 0 1 profiled 3 3 : 1 : 2 profiled Omni Omni 2 PAPI 2) static void ompc_func(void ** ompc_args){ int i; { int lb, ub, step; double start_time = 0.0, stop_time = 0.0; lb = 0, ub = N, step = 1; _ompc_profiled_sched_init(lb, ub, step, 1); while (_ompc_profiled_sched_next(&lb, &ub, start_time, stop_time)) { _ompc_profiled_get_time(&start_time); for (i = lb; i < ub; i += step) { LOOP_BODY; _ompc_profiled_get_time(&stop_time); 2 profiled CPU ompc profiled sched init() ompc profiled sched next() ompc profiled sched next() ompc profiled get time() 3 ompc profiled sched next() chunk queue dynamic, guided 4. profiled NAS Parallel Benchmarks (NPB-2.3) EP, CG OpenMP Fortran OpenMP RWCP EP EP 2n 3
4 Execution Time of EP Class S on Homogeneous Settings < = > /? static dynamic profiled Time [sec] "! # < = > /? Number of Nodes Execution Time of EP Class S on Homogeneous Settings (Pentium III nodes only ) ( ) * +, -. / ( "7;8: 1 3 $ % & ' ( ) * +, -. / ( 0 1 < 09A = = BDC = E F ( ) * +, -. / ( "798: : Performance Heterogeneous Cluster Fast nodes Slow node CPU Pentium III 500MHz Celeron 300MHz Cache 512KB 128KB Chipset Intel 440BX Same as Fast Memory SDRAM 512MB Same as Fast NIC Myrinet M2M-PCI32C Same as Fast Embarassingly Parallel profiled EP CG CG Conjugate Gradient CG unstructured grid computation CG profiled CPU Pentium III 500MHz Celeron 300MHz Fast, Slow OS RedHat 7.2 SCore gcc O profiled chunk size, eval size, eval skip 4.2 profiled EP 4 dynamic chunk profiled static static, dynamic, profiled EP class S Fast static EP static EP EP chunk queue dynamic static profiled EP 4
5 Execution Time of EP Class S on Heterogeneous Settings static (Slow x 1 + Fast) dynamic (Slow x 1 + Fast) profiled (Slow x 1 + Fast) static (Fast x 4) 2 Memory Behavior of the CG Class A on Celeron 1 + Pentium III 3 settings for static/profiled scheduling static profiled L2 miss ratio 29.6% 31.1% Page Fault at SCASH Barrier Time [sec] Time [sec] Number of Nodes Execution Time of EP Class S on Heterogeneous Settings (one Celeron node + Pentium III nodes) Execution Time of CG Class A on Heterogeneous Settings Number of Nodes static (Slow x 1 + Fast) profiled (Slow x 1 + Fast) Execution Time of CG Class A on Heterogeneous Settings (one Celeron node + Pentium III nodes) chunk vector EP profilied 6 CG class A static profiled CG kernel parallel for affinity CG static profiled profiled 4.4 profiled 2 Slow node 1 Fast node 3 4 CG Class A static profiled L2 cache miss ratio SCASH page fault L2 cache miss ratio PAPI : PAPI L2 TCM L2 cache miss ratio = PAPI L1 DCM 100 L2 cache miss ratio static, profiled SCASH page fault profiled static profiled affinity profiled : profiled page migration page migration page migration stable 5. Ongoing Work: Page Fault Conting Page Migration CG SDSM 5
6 OpenMP page reference counter 6) 1),4),5) SDSM SDSM page fault SPMD 6. Software Distributed Shared Memory, SCASH OpenMP Omni/SCASH heterogeneity page fault conting EP profiled static CG SDSM profiled SDSM page fault counting Omni/SCASH profiled 1) J. Bircsak, P. Craig, R. Crowell, J. Harris, C. A. Nelson, and C. D. Offner. Extending OpenMP for NUMA Machines: The Language. In Proceedings of Workshop on OpenMP Applications and Tool (WOMPAT 2000), July San Diego, USA. 2) S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci. A Portable Programming Interface for Performance Evaluation on Modern Processors. The International Journal of High Performance Computing Applications, 14(3): , Fall ) H. Harada, H. Tezuka, A. Hori, S. Sumimoto, T. Takahashi, and Y. Ishikawa. SCASH: Software DSM using High Performance Network on Commodity Hardware and Software. In Proceedings of Eighth Workshop on Scalable Shared-memory Multiprocessors, pages ACM, May ) A. Hasegawa, M. Sato, Y. Ishikawa, and H. Harada. Optimization and Performance Evaluation of NPB on Omni OpenMP Compiler for SCASH, Software Distributed Memory System (in Japanese). In IPSJ SIG Notes, 2001-ARC-142, 2001-HPC-85, pages , Mar ) J. Merlin. Distributed OpenMP: Extensions to OpenMP for SMP Clusters. In Invited Talk. Second European Workshop on OpenMP (EWOMP 00), Oct Edinburgh, Scotland. 6) D. S. Nikolopoulos, T. S. Papatheodorou, C. D. Polychronopoulos, J. Labarta, and E. Ayguadé. Is Data Distribution Necessary in OpenMP? In Proc. of Supercomputing 2000, Nov Dallas, TX. 7) M. Sato, H. Harada, and Y. Ishikawa. OpenMP compiler for Software Distributed Shared Memory System SCASH. In Proceedings of Workshop on OpenMP Applications and Tool (WOMPAT 2000), July San Diego, USA. 8) H. Tezuka, A. Hori, Y. Ishikawa, and M. Sato. PM: An Operating System Coordinated High Performance Communication Library. In P. Sloot and B. Hertzberger, editors, High-Performance Computing and Networking 97, volume 1225, pages Lecture Notes in Computer Science, Apr ) 6
Preliminary Evaluation of Dynamic Load Balancing Using Loop Re-partitioning on Omni/SCASH
Preliminary Evaluation of Dynamic Load Balancing Using Loop Re-partitioning on Omni/SCASH Yoshiaki Sakae Tokyo Institute of Technology, Japan sakae@is.titech.ac.jp Mitsuhisa Sato Tsukuba University, Japan
More informationCluster-enabled OpenMP: An OpenMP compiler for the SCASH software distributed shared memory system
123 Cluster-enabled OpenMP: An OpenMP compiler for the SCASH software distributed shared memory system Mitsuhisa Sato a, Hiroshi Harada a, Atsushi Hasegawa b and Yutaka Ishikawa a a Real World Computing
More informationOpenMP on the FDSM software distributed shared memory. Hiroya Matsuba Yutaka Ishikawa
OpenMP on the FDSM software distributed shared memory Hiroya Matsuba Yutaka Ishikawa 1 2 Software DSM OpenMP programs usually run on the shared memory computers OpenMP programs work on the distributed
More informationPM2: High Performance Communication Middleware for Heterogeneous Network Environments
PM2: High Performance Communication Middleware for Heterogeneous Network Environments Toshiyuki Takahashi, Shinji Sumimoto, Atsushi Hori, Hiroshi Harada, and Yutaka Ishikawa Real World Computing Partnership,
More informationImproving Geographical Locality of Data for Shared Memory Implementations of PDE Solvers
Improving Geographical Locality of Data for Shared Memory Implementations of PDE Solvers Henrik Löf, Markus Nordén, and Sverker Holmgren Uppsala University, Department of Information Technology P.O. Box
More informationThe Performance Analysis of Portable Parallel Programming Interface MpC for SDSM and pthread
The Performance Analysis of Portable Parallel Programming Interface MpC for SDSM and pthread Workshop DSM2005 CCGrig2005 Seikei University Tokyo, Japan midori@st.seikei.ac.jp Hiroko Midorikawa 1 Outline
More informationFig. 1. Omni OpenMP compiler
Performance Evaluation of the Omni OpenMP Compiler Kazuhiro Kusano, Shigehisa Satoh and Mitsuhisa Sato RWCP Tsukuba Research Center, Real World Computing Partnership 1-6-1, Takezono, Tsukuba-shi, Ibaraki,
More informationOmni OpenMP compiler. C++ Frontend. C- Front. F77 Frontend. Intermediate representation (Xobject) Exc Java tool. Exc Tool
Design of OpenMP Compiler for an SMP Cluster Mitsuhisa Sato, Shigehisa Satoh, Kazuhiro Kusano and Yoshio Tanaka Real World Computing Partnership, Tsukuba, Ibaraki 305-0032, Japan E-mail:fmsato,sh-sato,kusano,yoshiog@trc.rwcp.or.jp
More informationRecently, symmetric multiprocessor systems have become
Global Broadcast Argy Krikelis Aspex Microsystems Ltd. Brunel University Uxbridge, Middlesex, UK argy.krikelis@aspex.co.uk COMPaS: a PC-based SMP cluster Mitsuhisa Sato, Real World Computing Partnership,
More informationOpenMP on Distributed Memory via Global Arrays
1 OpenMP on Distributed Memory via Global Arrays Lei Huang a, Barbara Chapman a, and Ricky A. Kall b a Dept. of Computer Science, University of Houston, Texas. {leihuang,chapman}@cs.uh.edu b Scalable Computing
More informationData Distribution, Migration and Replication on a cc-numa Architecture
Data Distribution, Migration and Replication on a cc-numa Architecture J. Mark Bull and Chris Johnson EPCC, The King s Buildings, The University of Edinburgh, Mayfield Road, Edinburgh EH9 3JZ, Scotland,
More informationImproving Scalability of OpenMP Applications on Multi-core Systems Using Large Page Support
Improving Scalability of OpenMP Applications on Multi-core Systems Using Large Page Support Ranjit Noronha and D.K. Panda Department of Computer Science and Engineering The Ohio State University Columbus,
More informationOmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP
OmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP (extended abstract) Mitsuhisa Sato 1, Motonari Hirano 2, Yoshio Tanaka 2 and Satoshi Sekiguchi 2 1 Real World Computing Partnership,
More informationTowards a Lightweight HPF Compiler
Towards a Lightweight HPF Compiler Hidetoshi Iwashita 1, Kohichiro Hotta 1, Sachio Kamiya 1, and Matthijs van Waveren 2 1 Strategy and Technology Division, Software Group Fujitsu Ltd. 140 Miyamoto, Numazu-shi,
More informationDetection and Analysis of Iterative Behavior in Parallel Applications
Detection and Analysis of Iterative Behavior in Parallel Applications Karl Fürlinger and Shirley Moore Innovative Computing Laboratory, Department of Electrical Engineering and Computer Science, University
More informationBarbara Chapman, Gabriele Jost, Ruud van der Pas
Using OpenMP Portable Shared Memory Parallel Programming Barbara Chapman, Gabriele Jost, Ruud van der Pas The MIT Press Cambridge, Massachusetts London, England c 2008 Massachusetts Institute of Technology
More informationOmni Compiler and XcodeML: An Infrastructure for Source-to- Source Transformation
http://omni compiler.org/ Omni Compiler and XcodeML: An Infrastructure for Source-to- Source Transformation MS03 Code Generation Techniques for HPC Earth Science Applications Mitsuhisa Sato (RIKEN / Advanced
More informationApproaches to Performance Evaluation On Shared Memory and Cluster Architectures
Approaches to Performance Evaluation On Shared Memory and Cluster Architectures Peter Strazdins (and the CC-NUMA Team), CC-NUMA Project, Department of Computer Science, The Australian National University
More informationExploiting Memory Affinity in OpenMP through Schedule Reuse
Exploiting Memory Affinity in OpenMP through Schedule Reuse D.S. Nikolopoulos Coordinated Sciences Laboratory University of Illinois at Urbana-Champaign 1308 W. Main Street, MC-228 Urbana, IL, 61801, USA
More informationShared Memory Parallel Programming. Shared Memory Systems Introduction to OpenMP
Shared Memory Parallel Programming Shared Memory Systems Introduction to OpenMP Parallel Architectures Distributed Memory Machine (DMP) Shared Memory Machine (SMP) DMP Multicomputer Architecture SMP Multiprocessor
More informationNanos Mercurium: a Research Compiler for OpenMP
Nanos Mercurium: a Research Compiler for OpenMP J. Balart, A. Duran, M. Gonzàlez, X. Martorell, E. Ayguadé and J. Labarta Computer Architecture Department, Technical University of Catalonia, cr. Jordi
More informationOpenMP for Clusters. Lei Huang*, Barbara Chapman*, Ricky Kendall +
for Clusters Lei Huang*, Barbara Chapman*, Ricky Kendall + *Dept. of Computer Science, University of Houston Texas + Scalable Computing Laboratory, Ames Laboratory, Iowa {leihuang, chapman}@cs.uh.edu,
More informationSwitch. Switch. PU: Pentium Pro 200MHz Memory: 128MB Myricom Myrinet 100Base-T Ethernet
COMPaS: A Pentium Pro PC-based SMP Cluster and its Experience Yoshio Tanaka 1, Motohiko Matsuda 1, Makoto Ando 1, Kazuto Kubota and Mitsuhisa Sato 1 Real World Computing Partnership fyoshio,matu,ando,kazuto,msatog@trc.rwcp.or.jp
More informationCOMP Parallel Computing. SMM (2) OpenMP Programming Model
COMP 633 - Parallel Computing Lecture 7 September 12, 2017 SMM (2) OpenMP Programming Model Reading for next time look through sections 7-9 of the Open MP tutorial Topics OpenMP shared-memory parallel
More informationA brief introduction to OpenMP
A brief introduction to OpenMP Alejandro Duran Barcelona Supercomputing Center Outline 1 Introduction 2 Writing OpenMP programs 3 Data-sharing attributes 4 Synchronization 5 Worksharings 6 Task parallelism
More informationThe Performance Analysis of Portable Parallel Programming Interface MpC for SDSM and pthread
The Performance Analysis of Portable Parallel Programming Interface MpC for SDSM and pthread Hiroko Midorikawa Department of Computer and Information Science Seikei University -- Kichijouji kita-machi,
More informationSoftware Distributed Shared Memory with High Bandwidth Network: Production and Evaluation
,,.,, InfiniBand PCI Express,,,. Software Distributed Shared Memory with High Bandwidth Network: Production and Evaluation Akira Nishida, The recent development of commodity hardware technologies makes
More informationMulticore Performance and Tools. Part 1: Topology, affinity, clock speed
Multicore Performance and Tools Part 1: Topology, affinity, clock speed Tools for Node-level Performance Engineering Gather Node Information hwloc, likwid-topology, likwid-powermeter Affinity control and
More informationRWC PC Cluster II and SCore Cluster System Software High Performance Linux Cluster
RWC PC Cluster II and SCore Cluster System Software High Performance Linux Cluster Yutaka Ishikawa Hiroshi Tezuka Atsushi Hori Shinji Sumimoto Toshiyuki Takahashi Francis O Carroll Hiroshi Harada Real
More informationAgenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Agenda VTune Amplifier XE OpenMP* Analysis: answering on customers questions about performance in the same language a program was written in Concepts, metrics and technology inside VTune Amplifier XE OpenMP
More informationShared memory programming model OpenMP TMA4280 Introduction to Supercomputing
Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started
More informationImproving the Performance of OpenMP by Array Privatization
Improving the Performance of OpenMP by Array Privatization Zhenying Liu, Barbara Chapman, Tien-Hsiung Weng, Oscar Hernandez Department of Computer Science, University of Houston { zliu, chapman, thweng,
More informationEfficient Dynamic Parallelism with OpenMP on Linux SMPs
Efficient Dynamic Parallelism with OpenMP on Linux SMPs Christos D. Antonopoulos, Ioannis E. Venetis, Dimitrios S. Nikolopoulos and Theodore S. Papatheodorou High Performance Information Systems Laboratory
More informationLecture 4: OpenMP Open Multi-Processing
CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1 Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP
More informationECE 574 Cluster Computing Lecture 10
ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular
More informationA Multiprogramming Aware OpenMP Implementation
Scientific Programming 11 (2003) 133 141 133 IOS Press A Multiprogramming Aware OpenMP Implementation Vasileios K. Barekas, Panagiotis E. Hadjidoukas, Eleftherios D. Polychronopoulos and Theodore S. Papatheodorou
More information1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008
1 of 6 Lecture 7: March 4 CISC 879 Software Support for Multicore Architectures Spring 2008 Lecture 7: March 4, 2008 Lecturer: Lori Pollock Scribe: Navreet Virk Open MP Programming Topics covered 1. Introduction
More informationScientific Programming in C XIV. Parallel programming
Scientific Programming in C XIV. Parallel programming Susi Lehtola 11 December 2012 Introduction The development of microchips will soon reach the fundamental physical limits of operation quantum coherence
More informationCode Auto-Tuning with the Periscope Tuning Framework
Code Auto-Tuning with the Periscope Tuning Framework Renato Miceli, SENAI CIMATEC renato.miceli@fieb.org.br Isaías A. Comprés, TUM compresu@in.tum.de Project Participants Michael Gerndt, TUM Coordinator
More informationAn Introduction to OpenMP
An Introduction to OpenMP U N C L A S S I F I E D Slide 1 What Is OpenMP? OpenMP Is: An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism
More informationParallel applications controllable by the resource manager. Operating System Scheduler
An OpenMP Implementation for Multiprogrammed SMPs Vasileios K. Barekas, Panagiotis E. Hadjidoukas, Eleftherios D. Polychronopoulos and Theodore S. Papatheodorou High Performance Information Systems Laboratory,
More informationComparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster G. Jost*, H. Jin*, D. an Mey**,F. Hatay*** *NASA Ames Research Center **Center for Computing and Communication, University of
More informationCS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012
CS4961 Parallel Programming Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms Administrative Mailing list set up, everyone should be on it - You should have received a test mail last night
More informationSlurm Configuration Impact on Benchmarking
Slurm Configuration Impact on Benchmarking José A. Moríñigo, Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT - Dept. Technology Avda. Complutense 40, Madrid 28040, SPAIN Slurm User Group Meeting 16
More informationRuntime Address Space Computation for SDSM Systems
Runtime Address Space Computation for SDSM Systems Jairo Balart Outline Introduction Inspector/executor model Implementation Evaluation Conclusions & future work 2 Outline Introduction Inspector/executor
More informationCluster Computing. Performance and Debugging Issues in OpenMP. Topics. Factors impacting performance. Scalable Speedup
Topics Scalable Speedup and Data Locality Parallelizing Sequential Programs Breaking data dependencies Avoiding synchronization overheads Performance and Debugging Issues in OpenMP Achieving Cache and
More informationOpenMP Tutorial. Seung-Jai Min. School of Electrical and Computer Engineering Purdue University, West Lafayette, IN
OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 1 Parallel Programming Standards Thread Libraries - Win32 API / Posix
More informationShared Memory Programming with OpenMP (3)
Shared Memory Programming with OpenMP (3) 2014 Spring Jinkyu Jeong (jinkyu@skku.edu) 1 SCHEDULING LOOPS 2 Scheduling Loops (2) parallel for directive Basic partitioning policy block partitioning Iteration
More informationIntroduction to OpenMP
Introduction to OpenMP Lecture 4: Work sharing directives Work sharing directives Directives which appear inside a parallel region and indicate how work should be shared out between threads Parallel do/for
More informationCMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)
CMSC 714 Lecture 4 OpenMP and UPC Chau-Wen Tseng (from A. Sussman) Programming Model Overview Message passing (MPI, PVM) Separate address spaces Explicit messages to access shared data Send / receive (MPI
More informationUpdate of Post-K Development Yutaka Ishikawa RIKEN AICS
Update of Post-K Development Yutaka Ishikawa RIKEN AICS 11:20AM 11:40AM, 2 nd of November, 2017 FLAGSHIP2020 Project Missions Building the Japanese national flagship supercomputer, post K, and Developing
More informationWorkload Characterization using the TAU Performance System
Workload Characterization using the TAU Performance System Sameer Shende, Allen D. Malony, and Alan Morris Performance Research Laboratory, Department of Computer and Information Science University of
More informationCSE 392/CS 378: High-performance Computing - Principles and Practice
CSE 392/CS 378: High-performance Computing - Principles and Practice Parallel Computer Architectures A Conceptual Introduction for Software Developers Jim Browne browne@cs.utexas.edu Parallel Computer
More informationIntroduction to OpenMP
Introduction to OpenMP Lecture 9: Performance tuning Sources of overhead There are 6 main causes of poor performance in shared memory parallel programs: sequential code communication load imbalance synchronisation
More informationShared Memory Programming with OpenMP
Shared Memory Programming with OpenMP (An UHeM Training) Süha Tuna Informatics Institute, Istanbul Technical University February 12th, 2016 2 Outline - I Shared Memory Systems Threaded Programming Model
More informationHPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)
HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) V. Akishina, I. Kisel, G. Kozlov, I. Kulakov, M. Pugach, M. Zyzak Goethe University of Frankfurt am Main 2015 Task Parallelism Parallelization
More informationCompiler optimization for OpenMP programs
131 Compiler optimization techniques for OpenMP programs Shigehisa Satoh a,, Kazuhiro Kusano b and Mitsuhisa Sato c Tsukuba Research Center, Real World Computing Partnership, 1-6-1 Takezono, Tsukuba, Ibaraki
More informationMegaProto/E: Power-Aware High-Performance Cluster with Commodity Technology
MegaProto/E: Power-Aware High-Performance Cluster with Commodity Technology Λ y z x Taisuke Boku Λ Mitsuhisa Sato Λ Daisuke Takahashi Λ Hiroshi Nakashima Hiroshi Nakamura Satoshi Matsuoka x Yoshihiko Hotta
More informationIntroduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi
More informationA Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh
A Short Introduction to OpenMP Mark Bull, EPCC, University of Edinburgh Overview Shared memory systems Basic Concepts in Threaded Programming Basics of OpenMP Parallel regions Parallel loops 2 Shared memory
More informationModule 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program
The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives
More informationOptimizing OpenMP Programs on Software Distributed Shared Memory Systems
Optimizing OpenMP Programs on Software Distributed Shared Memory Systems Seung-Jai Min, Ayon Basumallik, Rudolf Eigenmann School of Electrical and Computer Engineering Purdue University, West Lafayette,
More information15-418, Spring 2008 OpenMP: A Short Introduction
15-418, Spring 2008 OpenMP: A Short Introduction This is a short introduction to OpenMP, an API (Application Program Interface) that supports multithreaded, shared address space (aka shared memory) parallelism.
More informationExecution model of three parallel languages: OpenMP, UPC and CAF
Scientific Programming 13 (2005) 127 135 127 IOS Press Execution model of three parallel languages: OpenMP, UPC and CAF Ami Marowka Software Engineering Department, Shenkar College of Engineering and Design,
More informationCOSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)
COSC 6374 Parallel Computation Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2014 Introduction Threads vs. processes Recap of
More informationOpenMP 4.0/4.5. Mark Bull, EPCC
OpenMP 4.0/4.5 Mark Bull, EPCC OpenMP 4.0/4.5 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all
More informationA Source-to-Source OpenMP Compiler
A Source-to-Source OpenMP Compiler Mario Soukup and Tarek S. Abdelrahman The Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto Toronto, Ontario, Canada M5S 3G4
More informationLAB. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers
LAB Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers Dan Stanzione, Lars Koesterke, Bill Barth, Kent Milfeld dan/lars/bbarth/milfeld@tacc.utexas.edu XSEDE 12 July 16, 2012 1 Discovery
More informationOpenMP 4.0. Mark Bull, EPCC
OpenMP 4.0 Mark Bull, EPCC OpenMP 4.0 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all devices!
More informationPROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec
PROGRAMOVÁNÍ V C++ CVIČENÍ Michal Brabec PARALLELISM CATEGORIES CPU? SSE Multiprocessor SIMT - GPU 2 / 17 PARALLELISM V C++ Weak support in the language itself, powerful libraries Many different parallelization
More informationAdvanced OpenMP. Lecture 11: OpenMP 4.0
Advanced OpenMP Lecture 11: OpenMP 4.0 OpenMP 4.0 Version 4.0 was released in July 2013 Starting to make an appearance in production compilers What s new in 4.0 User defined reductions Construct cancellation
More informationMPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016
MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared
More informationPROGRAMMING MODELS FOR HIGH PERFORMANCE COMPUTING
PROGRAMMING MODELS FOR HIGH PERFORMANCE COMPUTING Barbara Chapman, University of Houston and University of Southampton, chapman@cs.uh.edu Abstract High performance computing (HPC), or supercomputing, refers
More informationThe Design of MPI Based Distributed Shared Memory Systems to Support OpenMP on Clusters
The Design of MPI Based Distributed Shared Memory Systems to Support OpenMP on Clusters IEEE Cluster 2007, Austin, Texas, September 17-20 H sien Jin Wong Department of Computer Science The Australian National
More informationTowards OpenMP for Java
Towards OpenMP for Java Mark Bull and Martin Westhead EPCC, University of Edinburgh, UK Mark Kambites Dept. of Mathematics, University of York, UK Jan Obdrzalek Masaryk University, Brno, Czech Rebublic
More informationGPU GPU CPU. Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3
/CPU,a),2,2 2,2 Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3 XMP XMP-dev CPU XMP-dev/StarPU XMP-dev XMP CPU StarPU CPU /CPU XMP-dev/StarPU N /CPU CPU. Graphics Processing Unit GP General-Purpose
More informationEPL372 Lab Exercise 5: Introduction to OpenMP
EPL372 Lab Exercise 5: Introduction to OpenMP References: https://computing.llnl.gov/tutorials/openmp/ http://openmp.org/wp/openmp-specifications/ http://openmp.org/mp-documents/openmp-4.0-c.pdf http://openmp.org/mp-documents/openmp4.0.0.examples.pdf
More information2
A Study of High Performance Communication Using a Commodity Network of Parallel Computers 2000 Shinji Sumimoto 2 THE SUMMARY OF PH.D DISSERTATION 3 The increasing demands of information processing requires
More informationOur new HPC-Cluster An overview
Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization
More informationCOMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP
COMP4510 Introduction to Parallel Computation Shared Memory and OpenMP Thanks to Jon Aronsson (UofM HPC consultant) for some of the material in these notes. Outline (cont d) Shared Memory and OpenMP Including
More informationA common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...
OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.
More informationEffective Cross-Platform, Multilevel Parallelism via Dynamic Adaptive Execution
Effective Cross-Platform, Multilevel Parallelism via Dynamic Adaptive Execution Walden Ko, Mark Yankelevsky, Dimitrios S. Nikolopoulos, and Constantine D. Polychronopoulos Center for Supercomputing Research
More informationParallel Computing. Hwansoo Han (SKKU)
Parallel Computing Hwansoo Han (SKKU) Unicore Limitations Performance scaling stopped due to Power consumption Wire delay DRAM latency Limitation in ILP 10000 SPEC CINT2000 2 cores/chip Xeon 3.0GHz Core2duo
More informationYasuo Okabe. Hitoshi Murai. 1. Introduction. 2. Evaluation. Elapsed Time (sec) Number of Processors
Performance Evaluation of Large-scale Parallel Simulation Codes and Designing New Language Features on the (High Performance Fortran) Data-Parallel Programming Environment Project Representative Yasuo
More informationTopics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP
Topics Lecture 11 Introduction OpenMP Some Examples Library functions Environment variables 1 2 Introduction Shared Memory Parallelization OpenMP is: a standard for parallel programming in C, C++, and
More informationParallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides
Parallel Programming with OpenMP CS240A, T. Yang, 203 Modified from Demmel/Yelick s and Mary Hall s Slides Introduction to OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class
More informationA Characterization of Shared Data Access Patterns in UPC Programs
IBM T.J. Watson Research Center A Characterization of Shared Data Access Patterns in UPC Programs Christopher Barton, Calin Cascaval, Jose Nelson Amaral LCPC `06 November 2, 2006 Outline Motivation Overview
More informationCS420: Operating Systems
Threads James Moscola Department of Physical Sciences York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Threads A thread is a basic unit of processing
More informationThe MOSIX Scalable Cluster Computing for Linux. mosix.org
The MOSIX Scalable Cluster Computing for Linux Prof. Amnon Barak Computer Science Hebrew University http://www. mosix.org 1 Presentation overview Part I : Why computing clusters (slide 3-7) Part II : What
More informationCS550. TA: TBA Office: xxx Office hours: TBA. Blackboard:
CS550 Advanced Operating Systems (Distributed Operating Systems) Instructor: Xian-He Sun Email: sun@iit.edu, Phone: (312) 567-5260 Office hours: 1:30pm-2:30pm Tuesday, Thursday at SB229C, or by appointment
More informationCSE 160 Lecture 9. Load balancing and Scheduling Some finer points of synchronization NUMA
CSE 160 Lecture 9 Load balancing and Scheduling Some finer points of synchronization NUMA Announcements The Midterm: Tuesday Nov 5 th in this room Covers everything in course through Thu. 11/1 Closed book;
More informationLecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.
More informationParallel C++ Programming System on Cluster of Heterogeneous Computers
Parallel C++ Programming System on Cluster of Heterogeneous Computers Yutaka Ishikawa, Atsushi Hori, Hiroshi Tezuka, Shinji Sumimoto, Toshiyuki Takahashi, and Hiroshi Harada Real World Computing Partnership
More informationEfficient Execution of Parallel Java Applications
Efficient Execution of Parallel Java Applications Jordi Guitart, Xavier Martorell, Jordi Torres and Eduard Ayguadé European Center for Parallelism of Barcelona (CEPBA) Computer Architecture Department,
More informationMegaProto/E: Power-Aware High-Performance Cluster with Commodity Technology
MegaProto/E: Power-Aware High-Performance Cluster with Commodity Technology Λ y z x Taisuke Boku Λ Mitsuhisa Sato Λ Daisuke Takahashi Λ Hiroshi Nakashima Hiroshi Nakamura Satoshi Matsuoka x Yoshihiko Hotta
More informationLecture 9: MIMD Architecture
Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is
More informationResearch on the Implementation of MPI on Multicore Architectures
Research on the Implementation of MPI on Multicore Architectures Pengqi Cheng Department of Computer Science & Technology, Tshinghua University, Beijing, China chengpq@gmail.com Yan Gu Department of Computer
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna16/ [ 9 ] Shared Memory Performance Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture
More informationOpenMP - Introduction
OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı - 21.06.2012 Outline What is OpenMP? Introduction (Code Structure, Directives, Threads etc.) Limitations Data Scope Clauses Shared,
More information