Xmipp

xmipp_ml_tomo (v3.0)

Usage

Align and classify 3D images with missing data regions in Fourier space, e.g. subtomograms or RCT reconstructions, by a 3D multi-reference refinement based on a maximum-likelihood (ML) target function. For several cases, this method has been shown to be able to both align and classify in a completely reference-free manner, by starting from random assignments of the orientations and classes. The mathematical details behind this approach are explained in detail in

Scheres et al. (2009) Structure, 17, 1563-1572

Please cite this paper if this program is of use to you!
There also exists a standardized python script xmipp_protocol__mltomo.py for this program. Thereby, rather than executing the command line options explained below, the user can submit his jobs through a convenient GUI in the Xmipp_protocols, although we still recommend reading this page carefully in order to fully understand the options given in the protocol. Note that this protocol is available from the main xmipp_protocols setup window by pressing the Additional protocols button.)

Parameters

--missing <metadata=>
Metadata file with missing data region definitions
:

Angular sampling

--psi_sampling <float=-1.>
Angular sampling rate for the in-plane rotations(in degrees)
:

Regularization

--reg_steps <int=5>
Number of iterations in which the regularization is changed from reg0 to regF
:

Others

--thr <int=1>
Number of shared-memory threads to use in parallel
:

Additional options:

--noimp_threshold <float=1.>
Threshold to avoid division by zero for weighted averaging
:

Examples and notes

Input images file

The input metadata should contain the image column = and missingRegionNumber, indicating the subtomogran filename and the missing region number, respectively. It canalso contains columns with angles and shift information. The output will be a metadatawith the same format. Follow is an example:

 # XMIPP_STAR_1 *
 #             
 data_          
 loop_          
  _image        
  _missingRegionNumber  
  _angleRot     
  _angleTilt    
  _anglePsi     
  _shiftX       
  _shiftY       
  _shiftZ       
  _ref          
  _logLikelihood        
   32_000001.scl  1.00 0.000000 0.000  0.000  0.000000     0.000000  0    0.000000 1
   32_000002.scl  1.00 0.000000 0.000  0.000  0.000000     0.000000  0    0.000000 1

Input missing regions file

 # XMIPP_STAR_1 *
 # Wedgeinfo
 data_
 loop_          
  _missingRegionNumber 
  _missingRegionType   
  _missingRegionThetaY0
  _missingRegionThetaYF
   1  wedge_y  -64  64

The first column missingRegionNumber (starting at 1) is required for each type of missing region, this number should appears in the input images metadata
Here missingRegionType can be one of the following:

  • wedge_y for a missing wedge where the tilt axis is along Y, colums missingRegionThetaY0 and missingRegionThetaYF are used
  • wedge_x for a missing wedge where the tilt axis is along X, colums missingRegionThetaX0 and missingRegionThetaXF are used
  • pyramid for a missing pyramid where the tilt axes are along Y and X, same columns as wedge_y and wedge_x are used
  • cone for a missing cone (pointing along Z) column missingRegionThetaY0 is used

Examples and notes

Reference-free alignment of a single class with increasingly fine angular samplings

In total 25 iterations will be performed. The run is started from a weighted average structure obtained from random orientations of all particles (i.e. probably some sort of blob). Initially, 15 iterations with an angular sampling of 15 degrees and exhaustive searches, then 5 iterations with an angular sampling of 10 degrees and search ranges of 50 degrees, and finally a sampling of 5 degrees and search ranges of 25 degrees. In the first run, small images (of size 32x32x32) are used to speed up the computationally expensive exhaustive searches. In the next runs the full-sized images are used, but the maximum resolution taken into account is limited to 0.35 pixel^-1. In all runs, the angular samplings will be perturbed by a different random rotation in each iteration.

mkdir run1_align
ml_tomo -i images.sel --oroot run1_align/nref1_15deg --nref 1 --doc images.doc --missing wedges.doc --iter 15 --ang 15 --dim 32 --perturb
ml_tomo -i images.sel --oroot run1_align/nref1_10deg --nref 1 --doc run1_align/nref1_15deg_it000015.doc --keep_angles --missing wedges.doc --iter 5 --ang 10 --ang_search 50 --maxres 0.35 --perturb
ml_tomo -i images.sel --oroot run1_align/nref1_5deg --nref 1 --doc run1_align/nref1_10deg_it000005.doc --keep_angles -missing wedges.doc --iter 5 --ang 5 --ang_search 25 --maxres 0.35 -perturb

Classification into 3 classes after the images have already been aligned against a single reference

In this example, the aligned data set from the previous example is divided into three classes. The angles from the previous iteration are kept in the initial reference generation, so that the three initial references will be aligned. Then, local angular searches around these angles are performed, so that the particles may re-adjust their orientation as the references improve due to the classification into distinct classes. This is done in two stages, one with an initial coarser sampling and larger search range, and a second one with finer sampling and a more limited search range. To prevent getting stuck in local minima in the early stages of the classification, a regularization is applied in the first run that imposes similarity on the three references during the first five iterations.

ml_tomo -i images.sel --oroot run2_3classes/nref3_10deg  --nref 3 --doc run1_align/nref1_5deg_it000005.doc --keep_angles --missing wedges.doc --iter 20 --ang 10 --ang_search 50 --maxres 0.35 --perturb --reg0 5 --regF 0 --reg_steps 5
ml_tomo -i images.sel --oroot run2_3classes/nref3_5deg --nref 3 --doc run2_3classes/nref3_10deg_it000020.doc --keep_angles --missing wedges.doc --iter 5 --ang 5 --ang_search 25 --maxres 0.35 --perturb

Note that a MUCH faster classification may be obtained by keeping the angles completely fixed and only perform a separation into classes. This will only work if the orientations are not (much) affected by the alignment against the single consensus average. In this case, one also has the option to provide a mask (1=to be classified, 0=to be ignored) to focus the classification on an interesting area in the images (not tested extensively yet). The syntaxis would be:

ml_tomo -i images.sel --oroot run2_3classes/nref3_noalign  --nref 3 --doc run1_align/nref1_5deg_it000005.doc --keep_angles --missing wedges.doc --iter 20 --dont_align --maxres 0.35 --reg0 5 --regF 0 --reg_steps 5 --mask interesting.msk 

Simultaneously align and classify a data set into 3 classes

In the two examples above, the images were first aligned in a reference-free manner using a single class and then classified into three classes. This process may be combined in a single run by using a three-reference refinement where both the initial class assignments and the initial orientation assignments are random. Again, to speed up the process, small images and relatively coarse angular samplings are used. As explained above, subsequent runs may be performed with finer angular samplings, bigger images etc.

ml_tomo -i images.sel --oroot run3_3classes/nref3_15deg --nref 3 --doc images.doc --missing wedges.doc --iter 25 --ang 15 --dim 32 --perturb --reg0 5 --regF 0 --reg_steps 5

Alignment against an external reference with increasingly fine angular samplings

Because the Gaussian distributions inside the ML calculations use squared residuals as distance metric, they are highly sensitive to the absolute intensities in the reference. As long as your reference comes from the same data set you want to align (e.g. from the reference-free protocols mentioned above) this is not a problem. However, often we have an external reference structure that is not on the correct intensity (or grey) scale. In that case it is better to use a constrained cross-correlation coefficient (which is normalized and therefore invariant to the intensity scale). In the example below the same reference is used in three subsequent runs. This reference is assumed to be a nice one (e.g. from the EMDB or PDB) and only the angular sampling is gradually decreased.

ml_tomo -i images.sel --oroot run4_extref/15deg --ref myreference.vol --doc images.doc --missing wedges.doc --iter 1 --ang 15 
ml_tomo -i images.sel --oroot run4_extref/7deg --ref myreference.vol --doc run4_extref/15deg_it000001.doc --missing wedges.doc --iter 1 --ang 7 --ang_search 20
ml_tomo -i images.sel --oroot run4_extref/3deg --ref myreference.vol --doc run4_extref/7deg_it000001.doc --missing wedges.doc --iter 1 --ang 3 --ang_search 10

F.A.Q.

1. How should I prepare my data?

  • It is not necessary to downscale your images (to obtain faster results), as this can be done internally with -dim option of the program.

  • Any density that is not related to the molecule you want to average will bother with your alignment, and probably even more with your classification. Therefore, try to avoid images with strong densities for gold particles, neighbouring molecules or other artifacts. Note that masking these densities out may lead to under-estimation in the standard deviation of the noise, which is usually a bad thing to do in ML-restimation. Therefore, windowing your particles more tightly may be a better option, although extensive testing on this issue has not yet been performed.

  • The probability calculations (i.e. the similarity measures) are based on squared differences between the reference and the experimental images. This makes them highly sensitive to differences in grey-scale intensity of background mean. Therefore, normalization of your input data is important. One can use the xmipp_normalize program, with the -vol option to set the average to zero and the standard deviation to one for each image. Alternatively, one can do this within a binary mask (1=protein, 0=solvent) using the following script:
#!/bin/csh -f
#
#
set volin = $1
set mask = $2
set volout = $3
if ( $# == 3 ) then
 set avesig=`xmipp_statistics -i $volin --mask $mask | tail -1|awk '{print $6,$8}'`
 echo $avesig
 xmipp_operate -i $volin -minus $avesig[1] -o $volout
 xmipp_operate -i $volout -divide  $avesig[2] -o $volout
else
  echo "Usage: normalize_within_mask Vin mask Vout"
endif

2. What is the convention of the Euler angles and translations?

All input and output angles and translations are to transform the experimental images onto the reference structure(s). First the translations in x, y and z are applied, and then the rotation is applied (both according to XmippOld's convention).

Note that one can convert these transformations to other conventions. For example, Julio Ortiz (Martinsried) figured out the following convention to TOM Toolbox.

phi_tom = 90 - rot_xmipp
psi_tom = 270 - psi_xmipp
theta_tom = tilt_xmipp
xoff_tom = xoff_xmipp 
yoff_tom = yoff_xmipp 
zoff_tom = zoff_xmipp 

But then, the transformation in TOM is used to bring the reference onto the experimental images, so that the order of the angles must be changed and the origin offsets should be inverted:

phi_tom_b = -psi_tom
theta_tom_b = -theta_tom
psi_tom_b = -phi_tom
xoff_tom = -xoff_tom
yoff_tom = -yoff_tom
zoff_tom = -zoff_tom

User's comments

%COMMENT{type="tableappend"}%