From: Damián Montaldo (dmontaldo_at_dc.uba.ar)
Date: Thu Jul 19 2012 - 14:12:06 CDT
Hi, I'm trying to use NAMD (NAMD_2.9_Linux-x86_64-multicore-CUDA) with cuda.
This is the version of NAMD and this is error:
cudarulez_at_n2:~/inputs$ export
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/NAMD_2.9_Linux-x86_64-multicore-CUDA
cudarulez_at_n2:~/inputs$ /opt/NAMD_2.9_Linux-x86_64-multicore-CUDA/namd2 +p4
+idlepoll ubq_wb_eq.conf
Charm++: standalone mode (not using charmrun)
Converse/Charm++ Commit ID: v6.4.0-beta1-0-g5776d21
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (4-way SMP).
Charm++> cpu topology info is gathered in 0.001 seconds.
Info: NAMD 2.9 for Linux-x86_64-multicore-CUDA
Info:
Info: Please visit http://www.ks.uiuc.edu/Research/namd/
Info: for updates, documentation, and support information.
Info:
Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
Info: in all publications reporting results obtained with NAMD.
Info:
Info: Based on Charm++/Converse 60400 for multicore-linux64-iccstatic
Info: Built Mon Apr 30 14:02:11 CDT 2012 by jim on naiad.ks.uiuc.edu
Info: 1 NAMD 2.9 Linux-x86_64-multicore-CUDA 4 n2 cudarulez
Info: Running on 4 processors, 1 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.019984 s
FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 1 (n2): no
CUDA-capable device is detected
FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 3 (n2): no
CUDA-capable device is detected
------------- Processor 1 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 1 (n2): no
CUDA-capable device is detected
Program finished.
FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 2 (n2): no
CUDA-capable device is detected
Segmentation fault
Its a Debian GNU/Linux "up-to-date" in wheezy (to have all the cuda
packages from the official debian repositories).
I have installed debian "official" drivers for nvidia and cuda toolkit
from debian too.
I found in the mailing list archive related questions about the installed
driver
http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l.2011-2012/1259.html
But if I run nvidia-detect (from a debian package) everything seems to work
$ nvidia-detect
Detected NVIDIA GPUs:
02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GT218
[GeForce 210] [10de:0a65] (rev a2)
Your card is supported by the default drivers.
I also tried with the nvidia SDK deviceQuery (and some of the examples and
it works too)
n2:~# /opt/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery
[deviceQuery] starting...
/opt/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Found 2 CUDA Capable device(s)
Device 0: "Tesla C1060"
CUDA Driver Version / Runtime Version 4.2 / 4.2
CUDA Capability Major/Minor version number: 1.3
Total amount of global memory: 4096 MBytes (4294770688
bytes)
(30) Multiprocessors x ( 8) CUDA Cores/MP: 240 CUDA Cores
GPU Clock rate: 1296 MHz (1.30 GHz)
Memory Clock rate: 800 Mhz
Memory Bus Width: 512-bit
Max Texture Dimension Size (x,y,z) 1D=(8192),
2D=(65536,32768), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(8192) x 512,
2D=(8192,8192) x 512
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Concurrent copy and execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: No
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 3 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with
device simultaneously) >
Device 1: "GeForce 210"
CUDA Driver Version / Runtime Version 4.2 / 4.2
CUDA Capability Major/Minor version number: 1.2
Total amount of global memory: 511 MBytes (536150016 bytes)
( 2) Multiprocessors x ( 8) CUDA Cores/MP: 16 CUDA Cores
GPU Clock rate: 1400 MHz (1.40 GHz)
Memory Clock rate: 400 Mhz
Memory Bus Width: 64-bit
Max Texture Dimension Size (x,y,z) 1D=(8192),
2D=(65536,32768), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(8192) x 512,
2D=(8192,8192) x 512
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Concurrent copy and execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: No
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 2 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with
device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.2, CUDA Runtime
Version = 4.2, NumDevs = 2, Device = Tesla C1060, Device = GeForce 210
[deviceQuery] test results...
PASSED
> exiting in 3 seconds:
3...2...1...done!
I tried reinstalling and using the official drivers and toolkit from
nvidia and I'm stuck in the same error...
I found some issue(?) with users not listed in the passwd. I'm using nis
so I create a local user and it fails too.
http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l.2011-2012/2815.html
I don't know how to continue because I search for any related topic in the
list archive and I can't find anything more...
So any help it would be very appreciated.
Thanks for your time!
Damián.
This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:49 CST