The reason that people advocate concatenation of different systems to conduct PCA analysis is to establish a consistent coordinate/axis system. Moreover, it becomes possible to relate different modes in such a combined system because the principal components have the same meaning.

However, in that view, there is a significant flaw: if a mutant system has the absence of a particular PC, then overall that PC will become a minor component in the overall analysis and may not rise to the level of analysis (you have as many PCs as you have atoms). Thus, such a combined analysis requires significantly more examination to capture such differences.

If you don't combine analyses, then you have to find a way to identify which PCs are comparable between the different simulations, which is a tractable though computationally intensive problem. This is the way I used to approach this problem - as separate simulations with independent PCA analyses.

At this point, you're running into the real limits of the basic DCD file structure, not the CATDCD limits. The DCD file structure header can specify the number of atoms in the record, but CATDCD doesn't/can't implement this.

I would like to concatenate some mutant simulations with wild type simulations to run PCA analysis. However, catdcd will not allow me to concatenate all the simulations into a single DCD file due to the different numbers of atoms between the wildtype and mutant as a result of the point mutations I introduced. How can I combine these files to run an accurate PCA? I have heard that it is necessary to combine all simulations into a single file to compare the wildtype and mutant as this ensures PC1, PC2, etc. are the same between the various simulations. I appreciate any advice!