From: Andrew Dalke (dalke_at_ks.uiuc.edu)
Date: Sun Apr 13 1997 - 01:28:43 CDT

Hello,

We had planned to talk about other interesting variables to trace in
this week's VMDTech, but that will have to wait for a few weeks.
Instead, this week we'll give another example of how to manipulate
trajectories using the VMD scripting language, in this case by
averaging each frame with its neighboring frames.

In other VMD news, if anyone here is looking for a job or post-doc in
east-central Illinois, or knows someone doing so, we have several
positions open. See http://www.ks.uiuc.edu/Services/Events/ for more
information. We have also reorganized the VMD web pages (thanks to
Jeff Ulrich) and the archive of the VMD mailing list (thanks to Jim
Phillips).

As always, suggestions and comments about VMD or ideas for future
VMDTechs are welcome. Please send them to this list or to
vmd_at_ks.uiuc.edu.

                                                Andrew Dalke
                                                dalke_at_ks.uiuc.edu
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
                        Smoothing Trajectories

One problem with viewing the results of MD simulations is that random
thermal motions often distract you from the motions you are trying to
observe. To reduce the problem, one trick many people use is to
average several frames together to smooth out the random component and
emphasize the non-random one.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                        The Basics

As we pointed out in a previous VMDTech on Morphing, atom coordinate
information for each timestep is available to the scripting language.
It is a two step process. First, make an "atom selection" of the
atoms about which you are interested in getting more information. To
do this, you'll also need to know the appropriate molecule id, since
VMD will let you load several molecules. The molecule id is an
integer value or the word "top". The top molecule is designated in
the "Molecules" form and if there is only one molecule, it is by
definition the top molecule. (Note that all of this is describe in
the User's Guide.)

        set sel1 [atomselect top "resid 23"]
        set sel2 [atomselect top "all"]

Information about the selection is returned by using the "get" option,
followed by a list of keywords, as shown in these examples.
        puts [$sel1 get charge]
        puts [$sel1 get {index name}]
        puts [$sel2 get {x y z}]

Some information can be modified, and coordinates are one of them. To
change coordinates, use the "set" option. For now, assume that there
are 5 atoms in "resid 23" ($sel1 from above), then the coordinates can
be changed like this:

        $sel1 set {x y z} {{0 0 0} {1 0 0} {2 1 0} {3 1 0} {4 0 -3}}

(Yes, these coordinates are completely unreasonable.) Notice that the
coordinates are list of lists. It is a list of size N, and each item
on that list contains 3 values, for x, y and z, respectively.

For the scripts described for this VMDTech, we'll only need to get and
set the coordinate information, but we will need to get it over each
timestep. The selections have an option "frame" parameter to let you
access just that. There are a couple of ways to set this value, and
here are a couple of examples, assuming you've stored a selection in
the variable "sel1".

        # get the coordinates for the first timestep
        $sel1 frame 0
        puts "First timestep: [$sel1 get {x y z}]"
        # the sixth
        $sel1 frame 5
        puts "Sixth timestep: [$sel1 get {x y z}]"
        # and the last
        $sel1 frame last
        puts "Last timestep: [$sel1 get {x y z}]"

We'll also need to know how many timesteps are available for a given
molecule, and that can be found using the "molinfo" command. Again,
you will need to know the molecule id, but the example will use "top".

        puts "The top molecule has [molinfo top get numframes] timesteps."

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                        Simple Averaging

For the first script, the implementation is pretty easy; average the
coordinates over a window of size N, where N is odd. That is, the new
coordinates for frame a is:
                 (N-1)/2
              1 ----
   xyz'(a) = --- \ xyz(a + r)
              N /
                  ----
                r = -(N-1)/2

  One problem is how to treat cases near the edge where a + n <
0 or a + n >= the number of frames. For this case, we'll just reuse
the first or last frame. (Another option would be to not compute
those terms.)

  The biggest problem is storing the coordinate information. The
coordinates for a given timestep cannot be changed until the original
information was completely used. In other words, once xyz'(a) is
computed, the replacement of xyz'(a) -> xyz(a) cannot be done until
the values of xyz'(a-r) to xyz'(a+r) are computed. Thus, some of the
information must be temporarily stored elsewhere before using it.

  There are several ways to do this. One is to copy all the original
coordinate information to a Tcl array -- making a copy of the full
trajectory. This is the easiest implementation to understand, and is
shown here.

proc simple_average_frame {window_size {molid top}} {
    # check that the window size is correct (must be odd)
    if [expr $window_size % 2 == 0] {
        error "window size must be odd for 'simple_average_frames'"
    }
    # window goes from -r to r
    set r [expr ($window_size - 1) / 2]
    # averaging over all the atoms
    set sel [atomselect $molid all]
    # get the number of frames
    set n [molinfo $molid get numframes]

    # get the coordinate information
    for {set a -$r} {$a <= $n + $r} {incr a} {
        set f $a
        if {$a < 0} {set f 0}
        if {$a >= $n} {set f [expr $n - 1]}
        $sel frame $f
        set xyz($a) [$sel get {x y z}]
    }

    # average over the coordinates
    for {set a 0} {$a < $n} {incr a} {
        puts "Computing frame $a of $n"
        # initial coordinates
        set temp_xyz $xyz([expr $a - $r])
        # sum over the N coordinate sets
        for {set i [expr $a - $r + 1]} {$i <= $a + $r} {incr i} {
            set new_xyz {}
            foreach x1 $temp_xyz x2 $xyz($i) {
                lappend new_xyz [vecadd $x1 $x2]
            }
            set temp_xyz $new_xyz
        }
        set new_xyz {}
        set Winv [expr 1.0 / $window_size]
        foreach x1 $temp_xyz {
            lappend new_xyz [vecscale $Winv $x1]
        }
        # set the coordinates
        $sel frame $a
        $sel set {x y z} $new_xyz
    }
}
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                        Tastes Great -- Less Memory

However, this isn't very useful for large coordinate sets since it
does make a copy of the full animation, and does so using Tcl arrays,
which store numbers as strings rather than floats (though Tcl 8.0 is
supposed to change that). Here is a somewhat more complicated version
that stores only the number of coordinates as specified by the window
size. It is hard to say which is the better program to use; on a
small system, this new version was about 5% faster, but on a large
one, the previous example was about 5% faster. For the most part, it
doesn't matter.

  The coordinates are stored in the queue 'xyz', which ranges from -r
to r. For each new frame, the information is shifted left, and the new
coordinates placed in xyz($r).

# average timesteps given a window size (from -(N-1)/2 to (N-1)/2)
proc average_frames {window_size {molid top}} {
    # check that the window size is correct (must be odd)
    if [expr $window_size % 2 == 0] {
        error "window size must be odd for 'average_frames'"
    }

    # window goes from -r to r
    set r [expr ($window_size - 1) / 2]
    # averaging over all the atoms
    set sel [atomselect $molid all]
    # get the number of frames
    set n [molinfo $molid get numframes]

    # initialize the queue
    # create a set of selections storing the different timesteps
    # initialized to the first timesteps
    for {set i -$r} {$i <= $r} {incr i} {
        if {$i < 0} {
            set f 0
        } else {
            if {$i >= $n} {
                set i [expr $n - 1]
            } else {
                set f $i
            }
        }
        $sel frame $f
        set xyz($i) [$sel get {x y z}]
    }
    # loop over each timestep
    for {set a 0} {$a < $n} {incr a} {
        puts "Computing frame $a of $n"
        # sum over all N coordinates
        set xyz_avg $xyz(-$r)
        for {set i [expr -$r + 1]} {$i <= $r} {incr i} {
            set tmp_xyz {}
            foreach pos $xyz_avg pos2 $xyz($i) {
                lappend tmp_xyz [vecadd $pos $pos2]
            }
            set xyz_avg $tmp_xyz
        }
        # average them
        set tmp_xyz {}
        set Ninv [expr 1.0 / $window_size]
        foreach pos $xyz_avg {
            lappend tmp_xyz [vecscale $Ninv $pos]
        }

        # save the coordinates
        $sel frame $a
        $sel set {x y z} $tmp_xyz

        # shift the stored data by one
        for {set i [expr -$r + 1]} {$i <= $r} {incr i} {
            set xyz([expr $i - 1]) $xyz($i)
        }
        # and add the new term to the coordinate queue
        set f [expr $r + $a + 1]
        if {$f >= $n} {
            set f [expr $n - 1]
        }
        $sel frame $f
        set xyz($r) [$sel get {x y z}]
    }
}

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                        Using the Script

  For those of you without a DCD or CRD trajectory file, or without a
set of PDB files you can load into VMD (hmmm, that would be a useful
VMDTech topic!), you can get a PDB and DCD file for bacteriorhodopsin
from http://www.ks.uiuc.edu/Research/vmd/workshop/ where it states:

> The examples required two data files, the PDB structure of
> bacteriorhopsin and a DCD(21 MB!) animation of the retinal being
> pulled out. ( Here is a shorter version but still 4 MB.)

To load these into VMD, read the files as a "pdb and dcd" file in the
Files menu, or load the PDB file then append the DCD file with the
Edit menu.

You may also be able to use the 'urlload' option of VMD to load the
files (semi-)directly. Try the following commands (note that it saves
a temporary file in the directory $env(TMPDIR) and calls a perl script
directly in a manner than may not be supported in future versions):

        set base_url http://www.ks.uiuc.edu/Research/vmd/workshop
        # get the PDB file and load it
        mol urlload pdb $base_url/bR.pdb
        # get the DCD file
        exec $env(VMDDIR)/scripts/vmd/url_get $base_url/bRshort.dcd \
> $env(TMPDIR)/bRshort.dcd
        # and load it
        animate read dcd $env(TMPDIR)/bRshort.dcd
        # you will have to delete the temporary file yourself
        # _after_ VMD has read all the frames
        if {[molinfo top get numframes] == 97} {
                file delete $env(TMPDIR)/bRshort.dcd
        }

With the structure and trajectory loaded, and with the script entered
into VMD (either through a cut-and-paste or by saving it to a file and
'source'ing it) choose a window size (3 and 5 are good options) and
run the 'average_frames' command like this:

        average_frames 5

This will take a while (a bit over seven minutes on my Crimson! :( )
so for your own testing you will want to delete some frames. Use the
"skip" option in the Edit form, for instance, delete every 2 frames
twice, to cut out 3/4 of the frames).

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                        Weighted Average

  The last two examples gave the same weight to each of the timesteps
used in the averaging. As another possibility, you might want to
weight the terms in the window so the center is emphasized more than
the rest. For example, in a window of size 3 you might want the
middle weighted twice as much as each side (so the weights are {1, 2,
1} -- and if I remember FFTs right, this is called a "triangle
filter"). Here are two such functions.

# returns a list of window_size 1s.
# This is the same as a simple average
proc square_filter {window_size} {
    if [expr $window_size % 2 == 0] {
        error "window size must be odd for 'square_filter'"
    }
    set ret {}
    for {set i 0} {$i < $window_size} {incr i} {
        lappend ret 1
    }
    return $ret
}
# edges are 1, second from edges are 2, ..., middle is (window_size-1)/2
proc triangle_filter {window_size} {
    if [expr $window_size % 2 == 0] {
        error "window size must be odd for 'triangle_filter'"
    }
    set ret {}
    set r [expr ($window_size - 1) / 2]
    for {set i 0} {$i < $r} {incr i} {
        lappend ret $i
    }
    for {set i $r} {$i >= 0} {incr i -1} {
        lappend ret $i
    }
    return $ret
}

And you can add whatever other filters you want (how about a gaussian
curve?)

So I can run

        triangle_filter 11

and get

        Info) 0 1 2 3 4 5 4 3 2 1 0
or

        square_filter 5
to get

        Info) 1 1 1 1 1

Formally, given a list of weights w(r): -(N-1)/2 <= r <= (N-1)/2
where W = sum of all weights w(r),

                 (N-1)/2
              1 ----
   xyz'(a) = --- \ w(r) * xyz(a + r)
              W /
                  ----
                r = -(N-1)/2

and you can see that if w(r) = 1, this is the same as the original
formulation.

These weights can be used with the following procedure, called
"weighted_frames", which takes as input the list of weights (there
must be an odd number of them) and the molecule id. Both of these
parameters have defaults; the first is {1 2 1}, which is a triangle
filter that weights the current frame twice as much as its neighbors.
As usual, the molecule id defaults to the top molecule.

Here are some examples:

        weighted_frames {1 3 1} top
        # identical to "average_frames 3" -- we tested! :)
        weighted_frames [square_filter 3]
        weighted_frames [triangle_filter 5] 2

and the full source is:

# Given a list of weights, compute the weighted average
# For example: weighted_frames {1 2 3 2 1} top
# which is the same as: weighted_frames [triangle_filter 5] top
# The default is a triangle average of window size 3 applied to
# the top moleucle
proc weighted_frames {{filter {1 2 1}} {molid top}} {
    # check that the window size is correct (must be odd)
    set window_size [llength $filter]
    if [expr $window_size % 2 == 0] {
        error "filter size must be odd for 'weighted_frames'"
    }

    # window goes from -r to r
    set r [expr ($window_size - 1) / 2]
    # averaging over all the atoms
    set sel [atomselect $molid all]
    # get the number of frames
    set n [molinfo $molid get numframes]

    # compute the total weight (using a cute Tcl trick
    # to call a VMD function)
    set total_weight [eval "vecadd $filter"]
    # Check that it is somewhat sane
    if {$total_weight == 0} {
        error "cannot use weights that sum to 0 in 'weighted_frames'"
    }
    # save the weights in an array
    set tmp $filter
    for {set i -$r} {$i <= $r} {incr i} {
        set weight($i) [lvarpop tmp]
    }

    # initialize the 'xyz' queue
    # create a set of selections storing the different timesteps
    # initialized to the first timestep
    for {set i -$r} {$i <= $r} {incr i} {
        if {$i < 0} {
            set f 0
        } else {
            if {$i >= $n} {
                set i [expr $n - 1]
            } else {
                set f $i
            }
        }
        $sel frame $f
        set xyz($i) [$sel get {x y z}]
    }
    # loop over each timestep
    for {set a 0} {$a < $n} {incr a} {
        puts "Computing frame $a of $n"
        ##sum over all N coordinates
        # get the first coordinate
        set xyz_avg {}
        foreach pos $xyz(-$r) {
            lappend xyz_avg [vecscale $weight(-$r) $pos]
        }
        # get the rest
        for {set i [expr -$r + 1]} {$i <= $r} {incr i} {
            set tmp_xyz {}
            foreach pos $xyz_avg pos2 $xyz($i) {
                lappend tmp_xyz [vecadd $pos [vecscale $weight($i) $pos2]]
            }
            set xyz_avg $tmp_xyz
        }
        # average them
        set xyz_avg {}
        set Winv [expr 1.0 / $total_weight]
        foreach pos $tmp_xyz {
            lappend xyz_avg [vecscale $Winv $pos]
        }

        # save the coordinates
        $sel frame $a
        $sel set {x y z} $xyz_avg

        # shift the stored data by one
        for {set i [expr -$r + 1]} {$i <= $r} {incr i} {
            set xyz([expr $i - 1]) $xyz($i)
        }
        # and add the new term
        set f [expr $r + $a + 1]
        if {$f >= $n} {
            set f [expr $n - 1]
        }
        $sel frame $f
        set xyz($r) [$sel get {x y z}]
    }
}
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                Proposed VMD Vector List Commands

After doing these scripts and the morphing scripts from a few weeks
back, we've come to the conclusion that there needs to be a better way
to manipulate lists of vectors. For instance, given the coordinate
and the weight arrays used for 'weighted_frames', the final averaged,
weighted coordinates are computed as:

        set xyz_avg {}
        foreach pos $xyz(-$r) {
            lappend xyz_avg [vecscale $weight(-$r) $pos]
        }
        # get the rest
        for {set i [expr -$r + 1]} {$i <= $r} {incr i} {
            set tmp_xyz {}
            foreach pos $xyz_avg pos2 $xyz($i) {
                lappend tmp_xyz [vecadd $pos [vecscale $weight($i) $pos2]]
            }
            set xyz_avg $tmp_xyz
        }
        # average them
        set tmp_xyz {}
        set Winv [expr 1.0 / $total_weight]
        foreach pos $tmp_xyz {
            lappend tmp_xyz [vecscale $Winv $pos]
        }

The problem is there is no way in VMD to add lists of vectors element-
wise. (In other words, Tcl isn't APL :). Suppose instead there were
two functions 'veclistadd' and 'veclistscale' which operated on
"vector lists", using the reasonable definition. So you could say:

        veclistadd {{1 0 1} {2 3 4}} {{-1 0 -1} {2 2 2}}
        veclistscale 0.5 {{6 8 10} {-1 -2 -3}}

and get
        {0.0 0.0 0.0} {5.0 6.0 7.0}
        {3.0 4.0 5.0} {-0.5 -1.0 -1.5}
respectively.

If these functions existed, the average, weighted coordinates are
computed as:

        set xyz_sum $xyz(-$r)
        for {set i [expr $r + 1]} {$i <= $r} {incr i} {
            set xyz_sum [veclistadd $xyz_sum \
                             [veclistscale $weight($i) $xyz($i)]]
        }
        set xyz_sum [veclistscale [expr 1.0 / $total_weight] $xyz_sum]

which is much easier (IMHO, of course) to understand and use. So in
the final 1.2 version of VMD, there will be a full set of veclist
functions that complement the regular vector functions. And as with
the vector functions, the most commonly ones will also be implemented
in C++ for speed.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

  If there are problems, comments or suggestions about these examples,
please send them to us. These scripts are making their way to the VMD
script library (http://www.ks.uiuc.edu/Research/vmd/script_library)
and will be in the user's guide, so we want ideas for improvements.

                                                Andrew Dalke
                                                dalke_at_ks.uiuc.edu

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =