From: McGuire, Kelly (mcg05004_at_byui.edu)
Date: Thu Dec 06 2018 - 14:54:30 CST
If I use this, it takes 2 minutes to finish the 10 jobs, whereas if only one job is submitted at a time, then it takes 15-20 seconds:
#!/bin/bash
#SBATCH -C rhel7
#SBATCH --time=3-00:00:00 # walltime
#SBATCH --ntasks-per-node=24 # number of processor cores (i.e. tasks)
#SBATCH --nodes=10 # number of nodes
#SBATCH --gres=gpu:4
#SBATCH --mem=64G # memory per CPU core
#SBATCH -w, --nodelist=m9g-2-1
# Compatibility variables for PBS. Delete if not needed.
export PBS_NODEFILE=`/fslapps/fslutils/generate_pbs_nodefile`
export PBS_JOBID=$SLURM_JOB_ID
export PBS_O_WORKDIR="$SLURM_SUBMIT_DIR"
export PBS_QUEUE=batch
export dir=/panfs/pan.fsl.byu.edu/scr/grp/busathlab/software/namd/exec/NAMD_2.13b2_Linux-x86_64-multicore-CUDA/namd2
# Set the max number of threads to use for programs using OpenMP. Should be <= ppn. Does nothing if the program doesn't use OpenMP.
export OMP_NUM_THREADS=$SLURM_CPUS_ON_NODE
# Run NAMD
$dir Win1/Minimization.conf > log/Minimization1.log &
$dir Win2/Minimization.conf > log/Minimization2.log &
$dir Win3/Minimization.conf > log/Minimization3.log &
$dir Win4/Minimization.conf > log/Minimization4.log &
$dir Win5/Minimization.conf > log/Minimization5.log &
$dir Win6/Minimization.conf > log/Minimization6.log &
$dir Win7/Minimization.conf > log/Minimization7.log &
$dir Win8/Minimization.conf > log/Minimization8.log &
$dir Win9/Minimization.conf > log/Minimization9.log &
$dir Win10/Minimization.conf > log/Minimization10.log &
wait
Kelly L. McGuire
PhD Candidate
Biophysics
Department of Physiology and Developmental Biology
Brigham Young University
LSB 3050
Provo, UT 84602
________________________________
From: Bennion, Brian <bennion1_at_llnl.gov>
Sent: Thursday, December 6, 2018 1:51:38 PM
To: McGuire, Kelly; namd-l_at_ks.uiuc.edu
Subject: Re: namd-l: Parallel Jobs
Namd has no reource scheduler for clusters. Please share your submit script so that we can see what is being attempted.
Thanks
Brian
--- Sent from Workspace ONE Boxer<https://whatisworkspaceone.com/boxer> On December 6, 2018 at 12:41:55 PM PST, McGuire, Kelly <mcg05004_at_byui.edu> wrote: I just tried, for the first time, submitting 10 minimization jobs from the same bash submit script. Usually, using the CUDA NAMD version with 4 GPUs, an individual minimization job finishes in 15-20 seconds. If I try submitting 10 minimization jobs on 10 nodes, with 24 cpus per node, 64 GB per node, and 4 GPUs per node, it takes about 2 minutes to finish the minimization jobs. It seems that each job is not getting their own node and set of GPUs. How does NAMD handle parallel jobs like this? Kelly L. McGuire PhD Candidate Biophysics Department of Physiology and Developmental Biology Brigham Young University LSB 3050 Provo, UT 84602
This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:21:34 CST