ERROR: CUDA-enabled NAMD requires at least one patch per process

From: Arash Azari (arashazari_at_yahoo.com)
Date: Wed Apr 27 2011 - 23:49:56 CDT

Next message: JC Gumbart: "colvars orientation problem appears in 2.8 only?"
Previous message: Giacomo Fiorin: "Re: question about rotation force in SMD"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Dear NAMD users, I looked everywhere to find some clue about my problem, found nothing. Maybe I have done some silly stuff, please bear with me! Our cluster administrator told me that he do not know how to install the NAMD on GPU cluster, so I did it by myself. The cluster has 24 nodes (gigabit) 8 CPU per node and 1 GPU per node ( NVIDIA Fermi S2050 GPU box is connected to each node which has 4 Fermi C2050 NVIDIA cards). I just followed the release note instruction for MPI version installation (source code installation 2.8b1 and I used the same commands in the notes). I only changed (added extra options to) the following line: ./config Linux-x86_64-g++ --charm-arch mpi-linux-x86_64 --with-cuda --cuda-prefix /usr/local/cuda/ As far as I use 1 CPU it works (./charmrun +p1 ./namd2 +idlepoll src/alanin), but when I try to increase the number of CPU (actually I do not know how NAMD allocated GPUs) I will receive the following error message. For example by running: mpirun -np 2 ./namd2 /src/alanin or ./charmrun +p2 ./namd2 +idlepoll src/alanin the error message: Info: Info: Entering startup at 5.90103 s, 210.891 MB of memory in use Info: Startup phase 0 took 9.98974e-05 s, 210.891 MB of memory in use Info: Startup phase 1 took 0.000334024 s, 210.891 MB of memory in use ------------- Processor 0 Exiting: Called CmiAbort ------------ Reason: FATAL ERROR: CUDA-enabled NAMD requires at least one patch per process. Info: Startup phase 2 took 0.000103951 s, 210.891 MB of memory in use Info: Startup phase 3 took 8.10623e-05 s, 210.891 MB of memory in use FATAL ERROR: CUDA-enabled NAMD requires at least one patch per process. -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- [0] Stack Traceback: [0:0] _Z8NAMD_diePKc+0x4a [0x4f5b7a] [0:1] _ZN11WorkDistrib12patchMapInitEv+0x4ea [0x83ce1a] [0:2] _ZN4Node7startupEv+0x6db [0x7b37ab] [0:3] _Z15_processHandlerPvP11CkCoreState+0x599 [0x889d29] [0:4] CsdScheduler+0x41a [0x9846ea] [0:5] _ZN9ScriptTcl3runEv+0x78 [0x806ea8] [0:6] _Z18after_backend_initiPPc+0x234 [0x4f85a4] [0:7] main+0x24 [0x4f8854] [0:8] __libc_start_main+0xf4 [0x3a3a21d994] [0:9] __gxx_personality_v0+0x3e9 [0x4f52b9] -------------------------------------------------------------------------- mpirun has exited due to process rank 1 with PID 23661 on node gpu01 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). Could you please help me to solve this problem? Is there any problem in my installation? I highly appreciate your help. Regards, Arash Arash Azari

Next message: JC Gumbart: "colvars orientation problem appears in 2.8 only?"
Previous message: Giacomo Fiorin: "Re: question about rotation force in SMD"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:20:11 CST