Xmipp

Metadata Class

Introduction

In Xmipp information is transferred between programs using metadata class. Metadata are stored in Star Files which are described here. A complete list of valid labels is available in file metadata_sql.h (MAKE LINK ONCE WE MOVE TO GIT). Metadata are read in memory as tables in a sqlite database.

Each label is stored in a simple class. This class relates the label id (MDL_XXXX) with a string (that will be used when writing or reading metadata files) and a data type.

The basic use of the metadata class is best illustrated by examples.

Example1: Read a metadata, accessing individual values, modify them and save them using a single metadata

Example 2: using aggregated functions

An aggregate function is a function where the values of multiple rows are grouped together as input on certain criteria to form a single value. Common aggregate functions include:

  • Count()
  • Maximum()
  • Minimum()
  • Sum()
  • ...
We will use this table in the examples and save it as metadata in a file called md.xmd

X Y image
500 1000 Hansen
600 1600 Nilsen
700 700 Hansen
500 300 Hansen
600 2000 Jensen
500 100 Nilsen

Example of aggregate function COUNT

md.read("md.xmd")
mdOut.aggregate(md, AGGR_COUNT, MDL_X, MDL_Y, MDL_COUNT);

where

  • md: is the input metadata
  • mdOut: is the output metadata
  • AGGR_COUNT: identifies the aggregate function, in this case count
  • MDL_X: attribute used to aggregate
  • MDL_Y: attribute over which the aggregation function will operate
  • MDL_COUNT: label for the new column with the resulting data
Output will be

X Count
500 3
600 2
700 1

Example of aggregate function SUM

md.read("md.xmd")
mdOut.aggregate(md, AGGR_SUM, MDL_X, MDL_Y, MDL_SUM);

Output will be

X Sum
500 1400
600 3600
700 700

Example 3: Complex queries

get all rows from the metadata such that x=3 AND y=4

MDValueEQ eq1(MDL_X, 3.);
    MDValueEQ eq2(MDL_Y, 4.);
    MDMultiQuery multi;
    multi.addAndQuery(eq1);
    multi.addAndQuery(eq2);
    auxMetadata.importObjects(auxMetadata3, multi);

Example 4: Regular expressions

Metadata are stored in star files. Start files may contain several metadata tables. It is possible to read several metadata with a single command using regular expresions. All these metadata objects will be merged in a single one.

auxMetadata.read((String)"block_00000[12]@kk");

this code reads the metadata "block_000001@kk" and "block_000002@kk". The result will be the union of both metadata objects and will be stored in auxMetadata

Example 5: joins

Join operations will create one output metadata merging information from corresponding rows in two input metadatas.

outputMD.join(inputMD1,inputMD2,MDL_XXX,JOIN TYPE);

The label MDL_XXX set the condition,

  • INNER_JOIN: For each row R1 of inputMD1, the joined metadata has a row for each row in inputMD2 that satisfies inputMD2.XXX=inputMD1.XXX.
  • LEFT_OUTER JOIN: First, an inner join is performed. Then, for each row in inputMD1 that does not satisfy the join condition with any row in inputMD2, a joined row is added with null values in columns of inputMD2. Thus, the joined table unconditionally has at least one row for each row in inputMD1.
  • OUTER_JOIN: First, an inner join is performed. Then, for each row in inputMD1 that does not satisfy the join condition with any row in inputMD2, a joined row is added with null values in columns of inputMD2. Also, for each row of inputMD2 that does not satisfy the join condition with any row in inputMD1, a joined row with null values in the columns of T1 is added.
Joins are useful if MDL_XXX is unique, that is a given value of xxx never repeats for a given metadata.
  • NATURAL: compares those column that appear in both input metadatas. These columns appear only once in the output table.

Example 5a Natural Join

The following command uses as input metadatas mDsource and auxMetadata3. For each different value of the attribute MDL_X, a new row is created in metadata auxMetadata with the merging of the rows in metadata mDsource and auxMetadata3 such that mDsource.x=auxMetadata3.x.

auxMetadata.join(mDsource,auxMetadata3,MDL_X,NATURAL);

Example 6: Intersect, union and subtraction Operators

Using the operators UNION, INTERSECT and SUBTRACTION the output of more than one input metadata can be combined to form a single metadata. The UNION operator returns all rows that are in one or both of the input metadatas. The INTERSECT operator returns all rows that are strictly in both input metadatas. The SUBTRACTION operator returns the rows that are in the first input Metadata but not in the second. In all three cases, duplicate rows are eliminated unless ALL is specified.

Example 6a: Intersection

auxMetadata.intersection(mDsource,MDL_X);

Example 7: Another interesting methods: Size Sort

Size returns the metadata number of linnes and sort creates a new metadata sorted by a given label

Example 7a: Size

Size of metadata mDsource

size_t t=mDsource.size()

Example 7b: Sort

Sort metadata auxMetadata by the label MDL_X, lowest values first, create an outpur metadata not bigger than 2 rows and the first row to be used is the number 1.

auxMetadata2.sort(auxMetadata,MDL_X,true,2,1);

Example 8: Rows

Instead of accessing the data as pairs label value, it is possible (and more efficeint) to read whole rows

Example 8a: Rows

#include <data/metadata_extension.h>
MetaData md,md1 ;//metadata object
MDRow  row;//structure for reading lines in a metadata file
FileName fn;//input metadata file
fn.compose("block1","myfile.emx"); 
double samplingRate;
md.read(fn);// read metadata
String errorMessage
FOR_ALL_OBJECTS_IN_METADATA(md) //loop through all lines
            {
            md.getRow(row, __iter.objId); //read line
            if (row.getValue(MDL_CTF_SAMPLING_RATE,samplingRate))//get value for attribute ctf_sampling_rate
                  std::cerr << "The sampling rate is: " <<  samlingRate
            else
                  {
                  errorMessage=formatString("Cannot find label %s ",MDL::label2Str(MDL_CTF_SAMPLING_RATE).c_str);
                  REPORT_ERROR(ERR_MD_MISSINGLABEL,error_message);
                  }
          row.setValue(MDL_CTF_SAMPLING_RATE,samplingRate*2.);//store the double of the sampling rate 
       md1.addrow(row);  
        }
md1.write(fn);//save metada in file