From: Brian Bennion (brian_at_youkai.llnl.gov)
Date: Tue Jan 18 2005 - 10:34:47 CST
Hello Bora,
Can you send the detailed qsub command you use.
I was hoping that you might use the ++debug switch in your command line,
just to see if it ouput anything useful in solving the problem.
When you run with less processors, does the job stop at the same spot as
before or somewhere different?
My guess, since a core file is not found, is that you are running over the
length of time allowed for jobs running in the queue on your cluster. The
batch system is simply cutting you off at a preset time and you get a less
than enlightening error message.
Try putting ++debug in your commandline and also ask the administrator of
your cluster what the queue lengths are.
Regards
Brian
On Sun, 16 Jan 2005, bora erdemli wrote:
> Hi Brian;
>
> thank you for your consideration once again...
>
> my system did not create any core when simulation was
> stopped.
>
> I use "qsub" in order to submit my jobs.
>
> and my jobs are always stopped at the same spot.
>
> I have never run my jobs with ++debug switch and/or
> memory_paranoid.
>
> As far as I uderstood from your e mails, there is an
> hardware problem . I tried to it with lower number of
> cps and nodes.
>
> Thank you for your suggestions.
> If you have further ones, I would be appreciated..
>
> best Regards
>
> Bora
>
>
>
>
>
> --- Brian Bennion <brian_at_youkai.llnl.gov> wrote:
>
> > Hello Bora,
> >
> > I looked through your files and have some
> > suggestions but no solid
> > answers.
> > First, the number of cpus you are using (16) is
> > probably to many for the
> > number of atoms you have 14K using a ethernet
> > connection. Again, not a
> > solution, but something to keep in mind.
> >
> > Second, do you have a "core" that gets created when
> > the system crashes?
> >
> > Third what do you use to submit your jobs (ie
> > bsub,psub,qsub etc...).
> >
> > Fourth, does the job always die in the same spot?
> >
> > And finally, have you run the job with ++debug
> > switch and/or
> > memory_paranoid?
> >
> > My first guess is that the ethernet traffic is
> > overwhelming the ability of
> > NAMD to keep track of atoms properly. The traffic
> > is constant and is only
> > carrying a small amounts of information which is
> > really hard on most
> > ethernet cards/networks.
> > Regards
> > Brian
> >
> > On Fri, 14 Jan 2005, bora erdemli wrote:
> >
> > >
> > > Hi;
> > >
> > > I am sending you my configuration and log file
> > plus
> > > the error file.
> > > My simulation was NVT ensemble with periodic
> > boundary
> > > conditions. I applied Langevin Dynamics and PME
> > and
> > > 10000 step minimization before the simulation.
> > >
> > > I am waiting for your response because I stuck at
> > that
> > > point.
> > >
> > > Thank you for your consideration and responsing me
> > > everytime I send an e- mail...
> > >
> > >
> > >
> > >
> > > --- Brian Bennion <brian_at_youkai.llnl.gov> wrote:
> > >
> > > >
> > > > Hello,
> > > >
> > > > I have searched the email archives for your
> > initial
> > > > email Bora but have
> > > > been unsuccessful (ie searching for bora and
> > > > segmentation and fault).
> > > >
> > > > Can you restate the problem again, with all the
> > > > details that you can?
> > > > Log file snippets are nice as are details of the
> > > > system configuration (ie
> > > > type of machine-cluster or standalone, OS, batch
> > > > queueing system etc...)
> > > >
> > > > Thanks
> > > > Brian
> > > >
> > > >
> > > > On Thu, 13 Jan 2005, bora erdemli wrote:
> > > >
> > > > > Hi all;
> > > > >
> > > > > I did send an e mail two weeks ago almost with
> > the
> > > > > same topic and I couuld not find a solution to
> > my
> > > > > problem even in NAMD web site and mailing
> > list.
> > > > >
> > > > > I got this error at the middle of an
> > simulation. I
> > > > did
> > > > > NVT ensemble simulation with periodic boundary
> > > > > conditions.
> > > > >
> > > > > Do you have any suggestions?
> > > > >
> > > > > Best regards
> > > > >
> > > > >
> > > > >
> > > > > =====
> > > > > Sabri Bora Erdemli
> > > > > Koc University
> > > > > Computational Science and Engineering
> > > > > Research and Teaching Asistant
> > > > > Koc Universitesi pk.218 34550
> > > > > sariyer Istanbul/TURKEY
> > > > > tel no: 02123381736
> > > > > 05326512523
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > __________________________________
> > > > > Do you Yahoo!?
> > > > > Yahoo! Mail - now with 250MB free storage.
> > Learn
> > > > more.
> > > > > http://info.mail.yahoo.com/mail_250
> > > > >
> > > >
> > > >
> > >
> >
> *****************************************************************
> > > > **Brian Bennion, Ph.D.
> > > > **
> > > > **Computational and Systems Biology Division
> > > > **
> > > > **Biology and Biotechnology Research Program
> > > > **
> > > > **Lawrence Livermore National Laboratory
> > > > **
> > > > **P.O. Box 808, L-448 bennion1_at_llnl.gov
> > > > **
> > > > **7000 East Avenue phone: (925) 422-5722
> > > > **
> > > > **Livermore, CA 94550 fax: (925) 424-6605
> > > > **
> > > >
> > >
> >
> *****************************************************************
> > > >
> > > >
> > >
> > >
> > > =====
> > > Sabri Bora Erdemli
> > > Koc University
> > > Computational Science and Engineering
> > > Research and Teaching Asistant
> > > Koc Universitesi pk.218 34550
> > > sariyer Istanbul/TURKEY
> > > tel no: 02123381736
> > > 05326512523
> > >
> > >
> > >
> > >
> > >
> > >
> > > __________________________________
> > > Do you Yahoo!?
> > > Yahoo! Mail - Helps protect you from nasty
> > viruses.
> > > http://promotions.yahoo.com/new_mail
> >
> >
> *****************************************************************
> > **Brian Bennion, Ph.D.
> > **
> > **Computational and Systems Biology Division
> > **
> > **Biology and Biotechnology Research Program
> > **
> > **Lawrence Livermore National Laboratory
> > **
> > **P.O. Box 808, L-448 bennion1_at_llnl.gov
> > **
> > **7000 East Avenue phone: (925) 422-5722
> > **
> > **Livermore, CA 94550 fax: (925) 424-6605
> > **
> >
> *****************************************************************
> >
> >
>
>
> =====
> Sabri Bora Erdemli
> Koc University
> Computational Science and Engineering
> Research and Teaching Asistant
> Koc Universitesi pk.218 34550
> sariyer Istanbul/TURKEY
> tel no: 02123381736
> 05326512523
>
>
>
>
>
>
> __________________________________
> Do you Yahoo!?
> Yahoo! Mail - Easier than ever with enhanced search. Learn more.
> http://info.mail.yahoo.com/mail_250
>
*****************************************************************
**Brian Bennion, Ph.D. **
**Computational and Systems Biology Division **
**Biology and Biotechnology Research Program **
**Lawrence Livermore National Laboratory **
**P.O. Box 808, L-448 bennion1_at_llnl.gov **
**7000 East Avenue phone: (925) 422-5722 **
**Livermore, CA 94550 fax: (925) 424-6605 **
*****************************************************************
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:40:29 CST