From: Arash Azari (arashazari_at_yahoo.com)
Date: Wed Apr 27 2011 - 23:49:56 CDT
Dear NAMD users,
I looked everywhere to find some clue
about my problem, found nothing. Maybe I have done some silly stuff,
please bear with me!
Our cluster administrator told me that
he do not know how to install the NAMD on GPU cluster, so I did it by
myself. The cluster has 24 nodes (gigabit) 8 CPU per node and 1 GPU
per node ( NVIDIA Fermi S2050 GPU box is connected to each node which
has 4 Fermi C2050 NVIDIA cards). I just followed the release note
instruction for MPI version installation (source code installation
2.8b1 and I used the same commands in the notes). I only changed
(added extra options to) the following line:
./config Linux-x86_64-g++ --charm-arch
mpi-linux-x86_64 --with-cuda --cuda-prefix /usr/local/cuda/
As far as I use 1 CPU it works
(./charmrun +p1 ./namd2 +idlepoll src/alanin), but when I try to
increase the number of CPU (actually I do not know how NAMD allocated
GPUs) I will receive the following error message.
For example by running:
mpirun -np 2 ./namd2 /src/alanin
or
./charmrun +p2 ./namd2 +idlepoll
src/alanin
the error message:
Info:
Info: Entering startup at 5.90103 s,
210.891 MB of memory in use
Info: Startup phase 0 took 9.98974e-05
s, 210.891 MB of memory in use
Info: Startup phase 1 took 0.000334024
s, 210.891 MB of memory in use
------------- Processor 0 Exiting:
Called CmiAbort ------------
Reason: FATAL ERROR: CUDA-enabled NAMD
requires at least one patch per process.
Info: Startup phase 2 took 0.000103951
s, 210.891 MB of memory in use
Info: Startup phase 3 took 8.10623e-05
s, 210.891 MB of memory in use
FATAL ERROR: CUDA-enabled NAMD requires
at least one patch per process.
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in
communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open
MPI to kill all MPI processes.
You may or may not see output from
other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[0] Stack Traceback:
[0:0] _Z8NAMD_diePKc+0x4a
[0x4f5b7a]
[0:1]
_ZN11WorkDistrib12patchMapInitEv+0x4ea [0x83ce1a]
[0:2] _ZN4Node7startupEv+0x6db
[0x7b37ab]
[0:3]
_Z15_processHandlerPvP11CkCoreState+0x599 [0x889d29]
[0:4] CsdScheduler+0x41a [0x9846ea]
[0:5] _ZN9ScriptTcl3runEv+0x78
[0x806ea8]
[0:6]
_Z18after_backend_initiPPc+0x234 [0x4f85a4]
[0:7] main+0x24 [0x4f8854]
[0:8] __libc_start_main+0xf4
[0x3a3a21d994]
[0:9] __gxx_personality_v0+0x3e9
[0x4f52b9]
--------------------------------------------------------------------------
mpirun has exited due to process rank 1
with PID 23661 on
node gpu01 exiting without calling
"finalize". This may
have caused other processes in the
application to be
terminated by signals sent by mpirun
(as reported here).
Could you please help me to solve this
problem? Is there any problem in my installation?
I highly appreciate your help.
Regards,
Arash
Arash Azari
This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:20:11 CST