From: John Stone (johns_at_ks.uiuc.edu)
Date: Wed Apr 27 2022 - 13:49:18 CDT

Oh yes, I think we all completely agree.

IMHO, the biggest challenges for full on-demand loading boil down to:
A) the viability of implementing things like full random access support
   and efficient direct I/O for existing trajectory file formats.
and/or
B) convincing simulation tool developers to adopt new binary trajectory
   file formats that do support these things.

It is also worth noting that sometimes the features required for
out-of-core on-demand loading are in direct conflict with other
features like variable atom counts per timestep. Not that there
can't be resolutions to these challenges, but it makes the software
complexity much much higher.

Best,
  John

On Wed, Apr 27, 2022 at 01:38:26PM -0300, Leonardo Palmieri wrote:
> No argument against using BigDCD.
>
> No argument against using catDCD to remove water molecules.
>
> Both work very well.
>
> Using Extensions, eg Timeline, is a straightforward, easy and graphic
> way to do analysis. Just thinking another paradigm in using VMD
> analysis capabilities. On demand loading of trajectory could also
> enable on-the-fly analysis.
>
> Obviously, I know that it's easy to say, but hard to code and implement.
>
> Thanks!
>
>
>
> 2022-04-27 1:00 GMT-03:00, Geist, Norman <norman.geist_at_uni-greifswald.de>:
> > In case most of your system is made of water and you are not
> > interested in it  consider extracting a DCD without water using
> > catdcd and indexfiles. Without solvent, trajectories get often
> > comfortably small.
> >
> > Bests
> > Norman Geist
> >
> >
> >
> > Am Dienstag, den 26-04-2022 um 22:41 schrieb John Stone:
> >
> >
> >
> > Hi,
> >   So what is described there isn't what we ended up implementing.
> > There is functionality along these lines in developmental parts of
> > VMD now, but not in the general case yet for a few reasons.
> > 1) I ended up going against mmap(), and instead we use another API
> >    known as direct I/O, which is more portable among operating
> > systems
> >    rather than Unix-only.
> >
> > 2) To get the performance we want for out-of-core I/O (via any API),
> >    it ultimately requires a more purpose-designed trajectory format,
> >    which is what I ended up doing in the so-called ".js" file
> > format,
> >    an early version of which is described here:
> >      https://urldefense.com/v3/__https://link.springer.com/chapter/10.1007/978-3-642-24031-7_1__;!!DZ3fjg!6z50G4NB2nmISCxi8qHRlMHF48A_0npArl0yt35YoH-5WRcgHmvP2NOaKWtRzRLZ8NqzOLl8Tat-9IGLq8jfOe4$
> >
> >
> > 3) As we've implemented an increasing number of analytical and
> >    visualization features with GPU acceleration, ensuring that we
> >    had a way of supporting GPUs with out-of-core I/O became an
> >    increasingly important consideration that was not met by any
> >    existing approach.  There is now a prototype implementation in
> >    VMD using a combination of 2) and 3) here, that can achieve
> >    massive I/O rates (over 70GB/sec for example, to network
> >    attached storage from a single DGX-2 compute node).  This
> >    requires further specialization of the trajectory I/O code,
> >    and I've done it for some early cases, but it needs to become
> >    pervasive through VMD, and this is something that will be done
> >    using modern C++ constructs that requires C++ >= 2014.
> >    Again, here we will still need special file formats to do it
> > well,
> >    so at the outset, it will only be the ".js" format that is
> > supported.
> >
> > For the time being, using DCD files or other legacy file formats,
> > the bigDCD script or your own "for" loop script is going to be the
> > best way to go because a lot of the readers for these existing
> > trajectory formats can't do full random access as currently
> > implemented,
> > so to get reasonable performance they'll have to be processed
> > "in-order"
> > for best performance at present.
> >
> > If you have specific needs that require random access, let me know
> > more
> > of the details.  So far I haven't heard anything that would be an
> > argument
> > against using BigDCD or similar methods with simple scripting
> > approaches.
> >
> > Best,
> >   John
> >
> > On Tue, Apr 26, 2022 at 04:41:29PM -0300, Leonardo Palmieri wrote:
> >> Well, I also found this:
> >>
> >> I think it was from you, John.
> >>
> >> From: John Stone (johns_at_ks.uiuc.edu)
> >> Date: Fri Feb 15 2008 - 17:03:15 CST
> >>
> >> "...I'm also working on a future design change for the VMD internals
> > that will
> >> enable it to work with trajectories that are far larger than the
> > amount
> >> of physical memory in the machine through a new out-of-core
> > trajectory
> >> plugin API. I will likely implement this first using my own special
> >> trajectory format and use mmap() and related kernel VM calls to
> > allow
> >> VMD to map monstrously huge MD trajectories into virtual memory.
> >> The trick will be to add code for pre-fetching threads during
> > trajectory
> >> analysis and playback, and to give the OS kernel "hints" about which
> >> timesteps need to be in-core and which ones can optionally be paged
> > out.
> >> Later on, I hope to have a more general implementation that can work
> > with
> >> any reasonable trajectory format (and without the need for mmap()),
> > where
> >> VMD will keep a working set of frames in-core, and will dynamically
> >> load/free frames as analysis/visualization operations demand. This
> > too
> >> will attempt to use scout threads to prefetch frames on-the-fly
> > before
> >> they are needed so that the user "feels" like they were already in
> > memory.
> >> I don't have a timeline for these developments yet, I'll know more
> > once
> >> my experiments with my initial Unix-specific mmap() based
> > implementation
> >> have made significant progress."
> >>
> >> That's I'm talking about...
> >>
> >> 2022-04-26 16:35 GMT-03:00, Leonardo Palmieri :
> >> > BigDCD and scripts works well, some problems sometimes...
> >> >
> >> > but the point is:
> >> >
> >> > I'm interested in use extensions from Extension > Analysis, in
> > graphic
> >> > mode, remotely accessing interactively a node in the computer
> > where
> >> > the trajectory is stored.
> >> >
> >> > I'm using compressed X11 forwarding to have the graphic VMD
> > working
> >> > remotely, but the memory available per node cannot store the
> > entire
> >> > trajectory and the VMD crashes when it run out of memory.
> >> >
> >> > That's the reason.
> >> >
> >> > Thanks!
> >> >
> >> > 2022-04-26 16:17 GMT-03:00, John Stone :
> >> >> Can you tell us why the bigDCD script isn't a choice for you?
> >> >>
> >> >> Best,
> >> >>   John
> >> >>
> >> >> On Tue, Apr 26, 2022 at 04:12:18PM -0300, Leonardo Palmieri
> > wrote:
> >> >>> Hi everybody,
> >> >>>
> >> >>> Is there a way to analyse a trajectory without loading the
> > entire
> >> >>> trajectory file in RAM's computer?
> >> >>>
> >> >>> I know that is possible to do choosing a sub set of frames or
> > choosing
> >> >>> a larger stride or scripting using BigDCD, but all of those is
> > not a
> >> >>> choice for me. Is there another way?
> >> >>>
> >> >>> Thanks in advance!
> >> >>>
> >> >>>
> >> >>> --
> >> >>> att
> >> >>>
> >> >>> Leonardo Palmieri
> >> >>
> >> >> --
> >> >> NIH Center for Macromolecular Modeling and Bioinformatics
> >> >> Beckman Institute for Advanced Science and Technology
> >> >> University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
> >> >> http://www.ks.uiuc.edu/~johns/           Phone:
> > 217-244-3349
> >> >> http://www.ks.uiuc.edu/Research/vmd/
> >> >>
> >> >
> >> >
> >> > --
> >> > att
> >> >
> >> > Leonardo Palmieri
> >> >
> >> > Pai de gente
> >> > Pai de planta
> >> >
> >>
> >>
> >> --
> >> att
> >>
> >> Leonardo Palmieri
> >>
> >> Pai de gente
> >> Pai de planta
> >
> > --
> > NIH Center for Macromolecular Modeling and Bioinformatics
> > Beckman Institute for Advanced Science and Technology
> > University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
> > http://www.ks.uiuc.edu/~johns/           Phone: 217-244-3349
> > http://www.ks.uiuc.edu/Research/vmd/
> >
>
>
> --
> att
>
> Leonardo Palmieri
>
> Pai de gente
> Pai de planta

-- 
NIH Center for Macromolecular Modeling and Bioinformatics
Beckman Institute for Advanced Science and Technology
University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
http://www.ks.uiuc.edu/~johns/           Phone: 217-244-3349
http://www.ks.uiuc.edu/Research/vmd/