Xmipp

classify_pca

(syntaxis changed as of version 1.2)

Purpose

PCA (Principal Component Analysis) is a linear mapping technique. PCA involves a mathematical procedure that transform a number of (possible) correlated variables into a smaller number of uncorrelated variables called principal components. Once you find the components, you can project your data using the projectPCA program.

Usage

$ classify_pca ...

Parameters

  • -i [Input data file] The input data file (raw file). It should be a text file with each row representing the data items and each column representing the variables. It should have the following format:
       3 1000
       12 34 54
       -12 45 76
       ...
       32 45 76
       
    The first line indicates the dimension of the vectors (in this case 3) and the number of vectors (in this case 1000). Please note that vector components (variables) are separated by empty spaces. Additionally, the last column can also be used as a label for the vector. Example:
       3 1000
       12 34 54     labelA
       -12 45 76   labelB
       ...
       32 45 76     labelN 
       
  • -o [basename] This is the file name for the generated output files. PCA produces two output files: basename.evec where the eigen vectors are stored and basename.eval contained the eigen values. Example:
    3  3
    0.4   0.2  0.8
    -0.1  0.3   0.4
    ...
    0.2   0.5  0.7
       
    The first line first indicates the dimension of the vectors (in this case 3 but it depends on the dimmensionality of the input data)and the number of vectors (in this case also 3). The rest of the lines represent the eigen vectors.
  • -verb [level] Information level that is given as output while running:
    • 0 No information
    • 1 Progress bar with the elapsed time and estimated time to finish (default)

Examples and notes

Example 1: Find the principal components of a data set stored in "test.dat"

$ classify_pca  -i test.dat -o testPCA

In this case the following parameters are set by default:

Input data file : test.dat
Output files : testPCA.evec (eigen vectors)
               testPCA.eval (eigen values)
Algorithm information output file : testPCA.inf

The resulting eigen vectors (eigen images) are stored in testPCA.evec file. Their corresponding eigen values are then stored in testPCA.eval. The algorithm information file is stored in test.inf

Now, you can project your data using the projectPCA program.

-- AlfredoSolano - 25 Jan 2007