i read this , arrived here, i think should (if not so, please, tell me) rewrite code
{ int = 0; char hostname[256]; gethostname(hostname, sizeof(hostname)); printf("pid %d on %s ready attach\n", getpid(), hostname); fflush(stdout); while (0 == i) sleep(5); }
in fortran. this answer understood in fortran use mpi_get_processor_name
in place of gethostname
. else simple flush
. it?
where should put it? in main program after mpi_init
? , then? should do?
for concerns compile options, referred this , used -v -da -q
options mpifort
wrapper.
this solution doesn't fit case, since need run program on 27 processes minimum, i'd check 1 process only.
simplest approach:
what run mpi job locally , see does. without of above code. if hangs use top
find out pid
of processes , 1 can guess rank pids (they tend consecutive , lowest 1 rank 0). below rank 0 process 1641 , rank 1 pid 1642 , on...
pid user pr ni virt res shr s %cpu %mem time+ command 1642 me 20 0 167328 7716 5816 r 100.0 0.047 0:25.02 a.out 1644 me 20 0 167328 7656 5756 r 100.0 0.047 0:25.04 a.out 1645 me 20 0 167328 7700 5792 r 100.0 0.047 0:24.97 a.out 1646 me 20 0 167328 7736 5836 r 100.0 0.047 0:25.00 a.out 1641 me 20 0 167328 7572 5668 r 99.67 0.046 0:24.95 a.out
then gdb -pid
, examine stack , local variables in processes. (use help stack
in gdb console)
the important backtrace, print bt
in console.
this work when examining deadlocks. less when have stop @ specific place. have attach debugger early.
your code:
i don't think flush necessary in fortran. think fortran write
, print
flush necessary @ least in compilers use.
but can use flush
statement
use iso_fortran_env flush(output_unit)
just put flush after write
print hostname
, pid
. said start printing alone.
what login node , attach gdb righ process like
gdb -pid 12345
for sleep can use non-standard sleep
intrinsic subroutine available in many compilers or write own.
whether before or after mpi_init
? if want print rank, must after. using mpi_get_processor_name
must after. recommended call mpi_init
possible in program.
the code like
use mpi implicit none character(mpi_max_processor_name) :: hostname integer :: rank, ie, pid, hostname_len integer, volatile :: call mpi_init(ie) call mpi_get_processor_name(hostname, hostname_len, ie) !non-standard extension pid = getpid() call mpi_comm_rank(mpi_comm_world, rank, ie) write(*,*) "pid ", pid, " on ", trim(hostname), " ready attach world rank ", rank !this serves block execution @ specific place until unblock in gdb setting i=0 = 1 !non-standard extension call sleep(1) if (i==0) exit end end
important note: if compile optimizations compiler can see i==0
never true , remove check completely. must lower optimizations or declare i
volatile
. volatile means value can change @ time , compiler must reload value memory check. requires fortran 2003.
attaching right process:
the above code print, example,
> mpif90 -ggdb mpi_gdb.f90 > mpirun -n 4 ./a.out pid 2356 on linux.site ready attach world rank 1 pid 2357 on linux.site ready attach world rank 2 pid 2358 on linux.site ready attach world rank 3 pid 2355 on linux.site ready attach world rank 0
in top like
pid user pr ni virt res shr s %cpu %mem time+ command 2355 me 20 0 167328 7452 5564 r 100.0 0.045 1:42.55 a.out 2356 me 20 0 167328 7428 5548 r 100.0 0.045 1:42.54 a.out 2357 me 20 0 167328 7384 5500 r 100.0 0.045 1:42.54 a.out 2358 me 20 0 167328 7388 5512 r 100.0 0.045 1:42.51 a.out
and select rank want , execute
gdb -pid 2355
to attach rank 0 , on. in different terminal window, of course.
then like
main__ () @ mpi_gdb.f90:26 26 if (i==0) exit (gdb) info locals hostname = 'linux.site', ' ' <repeats 246 times> hostname_len = 10 = 1 ie = 0 pid = 2457 rank = 0 (gdb) set var = 0 (gdb) cont continuing. [inferior 1 (process 2355) exited normally]
Comments
Post a Comment