c++ - Using GDB to debug an MPI program in Fortran -


i read this , arrived here, i think should (if not so, please, tell me) rewrite code

{     int = 0;     char hostname[256];     gethostname(hostname, sizeof(hostname));     printf("pid %d on %s ready attach\n", getpid(), hostname);     fflush(stdout);     while (0 == i)         sleep(5); } 

in fortran. this answer understood in fortran use mpi_get_processor_name in place of gethostname. else simple flush. it?

where should put it? in main program after mpi_init? , then? should do?

for concerns compile options, referred this , used -v -da -q options mpifort wrapper.

this solution doesn't fit case, since need run program on 27 processes minimum, i'd check 1 process only.

simplest approach:

what run mpi job locally , see does. without of above code. if hangs use top find out pidof processes , 1 can guess rank pids (they tend consecutive , lowest 1 rank 0). below rank 0 process 1641 , rank 1 pid 1642 , on...

  pid user      pr  ni    virt    res    shr s  %cpu  %mem     time+ command                                                                                                                                           1642 me        20   0  167328   7716   5816 r 100.0 0.047   0:25.02 a.out                                                                                                                                             1644 me        20   0  167328   7656   5756 r 100.0 0.047   0:25.04 a.out                                                                                                                                             1645 me        20   0  167328   7700   5792 r 100.0 0.047   0:24.97 a.out                                                                                                                                             1646 me        20   0  167328   7736   5836 r 100.0 0.047   0:25.00 a.out                                                                                                                                             1641 me        20   0  167328   7572   5668 r 99.67 0.046   0:24.95 a.out  

then gdb -pid , examine stack , local variables in processes. (use help stack in gdb console)

the important backtrace, print bt in console.

this work when examining deadlocks. less when have stop @ specific place. have attach debugger early.


your code:

i don't think flush necessary in fortran. think fortran write , print flush necessary @ least in compilers use.

but can use flush statement

use iso_fortran_env  flush(output_unit) 

just put flush after write print hostname , pid. said start printing alone.

what login node , attach gdb righ process like

gdb -pid 12345 

for sleep can use non-standard sleep intrinsic subroutine available in many compilers or write own.

whether before or after mpi_init? if want print rank, must after. using mpi_get_processor_name must after. recommended call mpi_init possible in program.

the code like

  use mpi    implicit none    character(mpi_max_processor_name) :: hostname    integer :: rank, ie, pid, hostname_len    integer, volatile ::    call mpi_init(ie)    call mpi_get_processor_name(hostname, hostname_len, ie)    !non-standard extension   pid = getpid()    call mpi_comm_rank(mpi_comm_world, rank, ie)    write(*,*) "pid ", pid,  " on ",  trim(hostname), " ready attach world rank ", rank    !this serves block execution @ specific place until unblock in gdb setting i=0   = 1       !non-standard extension     call sleep(1)     if (i==0) exit   end  end 

important note: if compile optimizations compiler can see i==0 never true , remove check completely. must lower optimizations or declare i volatile. volatile means value can change @ time , compiler must reload value memory check. requires fortran 2003.

attaching right process:

the above code print, example,

> mpif90 -ggdb mpi_gdb.f90  > mpirun -n 4 ./a.out   pid         2356  on linux.site ready attach world rank            1  pid         2357  on linux.site ready attach world rank            2  pid         2358  on linux.site ready attach world rank            3  pid         2355  on linux.site ready attach world rank            0 

in top like

 pid user      pr  ni    virt    res    shr s  %cpu  %mem     time+ command                                                                                                                                           2355 me        20   0  167328   7452   5564 r 100.0 0.045   1:42.55 a.out                                                                                                                                             2356 me        20   0  167328   7428   5548 r 100.0 0.045   1:42.54 a.out                                                                                                                                             2357 me        20   0  167328   7384   5500 r 100.0 0.045   1:42.54 a.out                                                                                                                                             2358 me        20   0  167328   7388   5512 r 100.0 0.045   1:42.51 a.out 

and select rank want , execute

gdb -pid 2355 

to attach rank 0 , on. in different terminal window, of course.

then like

main__ () @ mpi_gdb.f90:26 26          if (i==0) exit  (gdb) info locals hostname = 'linux.site', ' ' <repeats 246 times> hostname_len = 10 = 1 ie = 0 pid = 2457 rank = 0  (gdb) set var = 0  (gdb) cont continuing. [inferior 1 (process 2355) exited normally] 

Comments