Re: Running NAMD on Forge (CUDA)

From: Gianluca Interlandi (gianluca_at_u.washington.edu)
Date: Thu Jul 12 2012 - 15:58:42 CDT

Next message: Aron Broom: "Re: Running NAMD on Forge (CUDA)"
Previous message: Aron Broom: "Re: Running NAMD on Forge (CUDA)"
In reply to: Aron Broom: "Re: Running NAMD on Forge (CUDA)"
Next in thread: Aron Broom: "Re: Running NAMD on Forge (CUDA)"
Reply: Aron Broom: "Re: Running NAMD on Forge (CUDA)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

> What are your simulation parameters:
>
> timestep (and also any multistepping values)
2 fs, SHAKE, no multistepping

> cutoff (and also the pairlist and PME grid spacing)
8-10-12 PME grid spacing ~ 1 A

> Have you tried giving it just 1 or 2 GPUs alone (using the +devices)?

Yes, this is the benchmark time:

np 1: 0.48615 s/step
np 2: 0.26105 s/step
np 4: 0.14542 s/step
np 6: 0.10167 s/step

I post here also part of the log running on 6 devices (in case it is
helpful to localize the problem):

Pe 4 has 57 local and 64 remote patches and 1066 local and 473 remote
computes.
Pe 1 has 57 local and 65 remote patches and 1057 local and 482 remote
computes.
Pe 5 has 57 local and 56 remote patches and 1150 local and 389 remote
computes.
Pe 2 has 57 local and 57 remote patches and 1052 local and 487 remote
computes.
Pe 3 has 58 local and 57 remote patches and 1079 local and 487 remote
computes.
Pe 0 has 57 local and 57 remote patches and 1144 local and 395 remote
computes.

Gianluca

> Gianluca
>
> On Thu, 12 Jul 2012, Aron Broom wrote:
>
> have you tried the multicore build? I wonder if the prebuilt
> smp one is just not
> working for you.
>
> On Thu, Jul 12, 2012 at 3:21 PM, Gianluca Interlandi
> <gianluca_at_u.washington.edu>
> wrote:
> are other people also using those GPUs?
>
>
> I don't think so since I reserved the entire node.
>
> What are the benchmark timings that you are given after
> ~1000
> steps?
>
>
> The benchmark time with 6 processes is 101 sec for 1000
> steps. This is only
> slightly faster than Trestles where I get 109 sec for 1000
> steps running on 16
> CPUs. So, yes 6 GPUs on Forge are much faster than 6 cores on
> Trestles, but in
> terms of SUs it makes no difference, since on Forge I still
> have to reserve the
> entire node (16 cores).
>
> Gianluca
>
> is some setup time.
>
> I often run a system of ~100,000 atoms, and I generally
> see an
> order of magnitude
> improvement in speed compared to the same number of
> cores without
> the GPUs. I would
> test the non-CUDA precompiled cude on your Forge system
> and see how
> that compares, it
> might be the fault of something other than CUDA.
>
> ~Aron
>
> On Thu, Jul 12, 2012 at 2:41 PM, Gianluca Interlandi
> <gianluca_at_u.washington.edu>
> wrote:
> Hi Aron,
>
> Thanks for the explanations. I don't know whether
> I'm doing
> everything
> right. I don't see any speed advantage running on
> the CUDA
> cluster
> (Forge) versus running on a non-CUDA cluster.
>
> I did the following benchmarks on Forge (the
> system has
> 127,000 atoms and
> ran for 1000 steps):
>
> np 1: 506 sec
> np 2: 281 sec
> np 4: 163 sec
> np 6: 136 sec
> np 12: 218 sec
>
> On the other hand, running the same system on 16
> cores of
> Trestles (AMD
> Magny Cours) takes 129 sec. It seems that I'm not
> really
> making good use
> of SUs by running on the CUDA cluster. Or, maybe
> I'm doing
> something
> wrong? I'm using the ibverbs-smp-CUDA
> pre-compiled version of
> NAMD 2.9.
>
> Thanks,
>
> Gianluca
>
> On Tue, 10 Jul 2012, Aron Broom wrote:
>
> if it is truly just one node, you can use
> the
> multicore-CUDA
> version and avoid the
> MPI charmrun stuff. Still, it boils down
> to much the
> same
> thing I think. If you do
> what you've done below, you are running one
> job with 12
> CPU
> cores and all GPUs. If
> you don't specify the +devices, NAMD will
> automatically
> find
> the available GPUs, so I
> think the main benefit of specifying them
> is when you
> are
> running more than one job
> and don't want the jobs sharing GPUs.
>
> I'm not sure you'll see great scaling
> across 6 GPUs for
> a
> single job, but that would
> be great if you did.
>
> ~Aron
>
> On Tue, Jul 10, 2012 at 1:14 PM, Gianluca
> Interlandi
> <gianluca_at_u.washington.edu>
> wrote:
> Hi,
>
> I have a question concerning running
> NAMD on a
> CUDA
> cluster.
>
> NCSA Forge has for example 6 CUDA
> devices and 16
> CPU
> cores per node. If I
> want to use all 6 CUDA devices in a
> node, how
> many
> processes is it
> recommended to spawn? Do I need to
> specify
> "+devices"?
>
> So, if for example I want to spawn 12
> processes,
> do I
> need to specify:
>
> charmrun +p12 -machinefile
> $PBS_NODEFILE +devices
> 0,1,2,3,4,5 namd2
> +idlepoll
>
> Thanks,
>
> Gianluca
>
>
> -----------------------------------------------------
> Gianluca Interlandi, PhD
> gianluca_at_u.washington.edu
> +1 (206) 685 4435
>
> http://artemide.bioeng.washington.edu/
>
> Research Scientist at the Department
> of
> Bioengineering
> at the University of Washington,
> Seattle WA
> U.S.A.
>
> -----------------------------------------------------
>
>
>
>
> --
> Aron Broom M.Sc
> PhD Student
> Department of Chemistry
> University of Waterloo
>
>
>
>
>
> -----------------------------------------------------
> Gianluca Interlandi, PhD
> gianluca_at_u.washington.edu
> +1 (206) 685 4435
>
> http://artemide.bioeng.washington.edu/
>
> Research Scientist at the Department of
> Bioengineering
> at the University of Washington, Seattle WA
> U.S.A.
>
> -----------------------------------------------------
>
>
>
>
> --
> Aron Broom M.Sc
> PhD Student
> Department of Chemistry
> University of Waterloo
>
>
>
>
> -----------------------------------------------------
> Gianluca Interlandi, PhD gianluca_at_u.washington.edu
> +1 (206) 685 4435
> http://artemide.bioeng.washington.edu/
>
> Research Scientist at the Department of Bioengineering
> at the University of Washington, Seattle WA U.S.A.
> -----------------------------------------------------
>
>
>
>
> --
> Aron Broom M.Sc
> PhD Student
> Department of Chemistry
> University of Waterloo
>
>
>
>
> -----------------------------------------------------
> Gianluca Interlandi, PhD gianluca_at_u.washington.edu
> +1 (206) 685 4435
> http://artemide.bioeng.washington.edu/
>
> Research Scientist at the Department of Bioengineering
> at the University of Washington, Seattle WA U.S.A.
> -----------------------------------------------------
>
>
>
>
> --
> Aron Broom M.Sc
> PhD Student
> Department of Chemistry
> University of Waterloo
>
>
>

-----------------------------------------------------
Gianluca Interlandi, PhD gianluca_at_u.washington.edu
+1 (206) 685 4435
http://artemide.bioeng.washington.edu/

Research Scientist at the Department of Bioengineering
at the University of Washington, Seattle WA U.S.A.
-----------------------------------------------------

Next message: Aron Broom: "Re: Running NAMD on Forge (CUDA)"
Previous message: Aron Broom: "Re: Running NAMD on Forge (CUDA)"
In reply to: Aron Broom: "Re: Running NAMD on Forge (CUDA)"
Next in thread: Aron Broom: "Re: Running NAMD on Forge (CUDA)"
Reply: Aron Broom: "Re: Running NAMD on Forge (CUDA)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:46 CST