Enhanced Memory debugging of MPI-parallel Applications in Open MPI

Size: px

Start display at page:

Download "Enhanced Memory debugging of MPI-parallel Applications in Open MPI"

Rosalyn Bryan
6 years ago
Views:

1 Enhanced Memory debugging of MPI-parallel Applications in Open MPI 4th Parallel tools workshop 2010 Shiqing Fan HLRS, High Performance Computing Center University of Stuttgart, Germany Slide 1

Project goals Full, fast & extensible MPI-2 implementation Thread-safety Prevent the

2 Introduction: Open MPI 1/3 A new MPI implementation from scratch w/o the cruft of previous implementation Design started in early 2004 PACX-MPI LAM/MPI LA-MPI FT-MPI Project goals Full, fast & extensible MPI-2 implementation Thread-safety Prevent the forking problem Combine the best ideas and techs. Open source license based on the BSD license Slide 2

3 Introduction: Open MPI 2/3 Current status Stable version v1.2.6 (April 2008) Release v1.3 comes very soon 14 members, 6 contributors 4 US DOE labs 8 universities 7 vendors 1 individual Slide 3

4 Introduction: Open MPI 3/3 Open MPI consists of three sub-packages Open MPI Open RTE Open RunTime Environment Open PAL Open Portable Access Layer Operating System Modular Component Architecture (MCA) Dynamically load available modules like plug-in and check for hardware Select best plug-in and unload others (e.g. if hw not available) Fast indirect calls into each plug-in Framework Framework BTL Comp User application MPI API Module Component Architecture Comp OpenIB TCP Comp Comp Myrinet SM Slide 4

Introduction: Valgrind 1/2 An Open-Source Debugging & Profiling tool For x86/linux, AMD64/Linux, PPC32/Linux and PPC64/Linux Works with any dynamically & statically linked application Memcheck - A

5 Introduction: Valgrind 1/2 An Open-Source Debugging & Profiling tool For x86/linux, AMD64/Linux, PPC32/Linux and PPC64/Linux Works with any dynamically & statically linked application Memcheck - A heavyweight memory checker Runs program on a synthetic CPU Identical to a real CPU, store information of memory Valid-value bits (V-bits) for each bit Has valid value or not Address bits (A-bits) for each byte Possible to read/write that location All reads and writes of memory are checked Calls to malloc/new/free/delete are intercepted Slide 5

6 Introduction: Valgrind 2/2 Use of uninitialized memory only reports the error when using the uninitialized value e.g. : int c[2]; int i = c[0]; /* OK!! */ if (i == 0) /* Memcheck: use of uninitialized value!! */ Use of free d memory Mismatched use of malloc/new with free/delete Memory leaks Overlap src and dst blocks memcpy(), strcpy(), strncpy(), strcat(), strncat() Slide 6

7 Valgrind MPI Example 1/2 Open MPI readily supports execution of apps with valgrind: mpirun np 2 valgrind./mpi_murks: Slide 7

8 Valgrind MPI Example 2/2 PID With Valgrind mpirun np 2 valgrind./mpi_murks: ==11278== Invalid read of size 1 ==11278== at 0x E: memcpy (../../memcheck/mac_replace_strmem.c:256) ==11278== by 0x80690F6: MPID_SHMEM_Eagerb_send_short (mpich/../shmemshort.c:70).. 2 lines of calls to MPIch-functions deleted... ==11278== by 0x80492BA: MPI_Send (/usr/src/mpich/src/pt2pt/send.c:91) ==11278== by 0x8048F28: main (mpi_murks.c:44) ==11278== Address 0x4158B0EF is 3 bytes after a block of size 40 alloc'd ==11278== at 0x4002BBCE: malloc (../../coregrind/vg_replace_malloc.c:160) ==11278== by 0x8048EB0: main (mpi_murks.c:39)... Buffer-Overrun by 4 Bytes in MPI_Send ==11278== Conditional jump or move depends on uninitialised value(s) ==11278== at 0x402985C4: _IO_vfprintf_internal (in /lib/libc so) ==11278== by 0x402A15BD: _IO_printf (in /lib/libc so) ==11278== by 0x8048F44: main (mpi_murks.c:46) It can not find: Printing of uninitialized variable May be run with 1 process: One pending Recv (Marmot). May be run with >2 processes: Unmatched Sends (Marmot). Slide 8

9 Design and implementation 1/3 Memchecker: a new concept to use valgrind s API internally in Open MPI to reveal bugs In the Application In Open MPI itself Implement generic interface memchecker as MCA Implemented in Open PAL layer Configure option --enable-memchecker Possibly pass installed Valgrind --with-valgrind=/path/to/valgrind Simply run command, e.g. : mpirun -np 2 valgrind./my_mpi Open MPI Open RTE Open PAL Memchecker valgrind Memchecker* solaris_rtc Memchecker some mca Operating System *currently no API implemented in rtc. Slide 9

10 Design and implementation 2/3 Detect application s memory violation of MPI-standard Application s usage of undefined data Application s memory access due to MPI-semantics Detect Non-blocking/One-sided communication buffer errors Functions in BTL layer for both communications Set memory accessibility independent of MPI operations i.e. only set accessibility for the fragment to be sent/received Handles derived datatypes MPI object checking Check definedness of MPI objects that passing to MPI API MPI_Status, MPI_Comm, MPI_Request and MPI_Datatype Could be disabled for better performance Slide 10

11 Design and implementation 3/3 Non-blocking send/receive buffer error checking Proc0 Proc1 MPI-Layer Buffer MPI_Isend Inaccessible (unaddressable) Frag0 Frag1 Fragn MPI_Irecv Inaccessible (unaddressable) PML P2P Management Layer BML BTL Management Layer not accessible not accessible Fragn MPI_Wait MPI_Wait BTL Byte Transfer Layer Slide 11

12 Detectable bug-classes 1/3 Non-blocking buffer accessed/modified before finished MPI_Isend (buffer, SIZE, MPI_INT,, &request); buffer[1] = 4711; MPI_Wait (&req, &status); The standard does not (yet) allow read access: MPI_Isend (buffer, SIZE, MPI_INT,, &request); result[1] = buffer[1]; MPI_Wait (&request, &status); Side note: MPI-1, p30, Rationale for restrictive access rules; allows better performance on some systems. Slide 12

13 Detectable bug-classes 2/3 Access to buffer under control of MPI: MPI_Irecv (buffer, SIZE, MPI_CHAR,, &request); buffer[1] = 4711; MPI_Wait (&request, &status); Side note: CRC-based methods do not reliably catch these cases. Memory that is outside receive buffer is overwritten : buffer = malloc( SIZE * sizeof(mpi_char) ); memset (buffer, SIZE * sizeof(mpi_char), 0); MPI_Recv(buffer, SIZE+1, MPI_CHAR,, &status); Side note: MPI-1, p21, rationale of overflow situations: no memory that outside the receive buffer will ever be overwritten. Slide 13

14 Detectable bug-classes 3/3 Usage of the Undefined Memory passed from Open MPI MPI_Wait(&request, &status); if (status.mpi_error!= MPI_SUCCESS) Side note: This field should remain undefined. MPI-1, p22 (not needed for calls that return only one status) MPI-2, p24 (Clarification of status in single-completion calls). Write to buffer before accumulate is finished : MPI_Accumulate(A, NROWS*NCOLS, MPI_INT, 1, 0, 1, \ xpose, MPI_SUM, win); A[0][1] = 4711; MPI_Win_fence(0, win); Slide 14

15 Performance 1/2 Benchmarks Intel MPI Bechmark Environment Dgrid-cluster at HLRS Dual-processor Intel Woodcrest Infiniband-DDR network with the Open Fabrics stack Test cases Plain Open MPI With memchecker component without MPI objects checking Slide 15

16 Performance 2/2 Intel MPI Benchmark, Bi-directional Get test Use 2 nodes, TCP connections employing IPoverIB-interface Run with/without Valgrind Slide 16

17 Valgrind (Memcheck) Extension 1/2 New client requests for: Watching on memory read operations Watching on memory write operations Initiating callback functions on memory read/write Making memory readable and/or writable use fast ordered set algorithm byte-wise memory checking handle the memory with mixed registered and unregistered blocks Slide 17

18 Valgrind (Memcheck) Extension 2/2 VALGRIND_REG_USER_MEM_WATCH (addr, len, op, cb, info) VALGRIND_UNREG_USER_MEM_WATCH (addr, len) Watch op could be: WATCH_MEM_READ, WATCH_MEM_WRITE and WATCH_MEM_RW Valgrind Alloc_mem Read_mem User app Alloc_mem Read_mem Read_cb Slide 18

19 Thank you very much! Slide 19

Memory Checking and Single Processor Optimization with Valgrind [05b]

Memory Checking and Single Processor Optimization with Valgrind Memory Checking and Single Processor Optimization with Valgrind [05b] University of Stuttgart High-Performance Computing-Center Stuttgart