Analysis

Analysis: Modules, chains, nodes, and links

Automated image analysis is a significant part of why we bother with OME. The use of analytic software to extract information from images is an active area of ongoing research. In many cases, these analyses feed upon each other — the output of one being the input of the next. Construction of useful sequences of analyses may be an important outcome of work with OME. Thus, as you might expect, the OME database contains a rich body of information about analysis programs and their utilization.

The basic unit of analysis in OME is a module. Modules are categorized by hierarchical categories (stored in the MODULE_CATEOGIRES table). Modules also have types, which refer to Perl classes that implement the OME::Analysis::Handler interface, and locations - which can be filenames for command-line executables or Perl classes. The EXECUTION_INSTRUCTIONS column contains an XML fragment that defines how the module is to be invoked. Modules can iterate over features in images — the module's DEFAULT_ITERATOR specifies how this is done. Finally, the module's NEW_FEATURE_TAG defines a tag that will be assigned to any new features created by the module.

For modules that are command-line executables, the CLI.xsd schema defines a template for specifying the structure of the inputs and outputs. Instances of this schema will be the contents of the EXECUTION_INSTRUCTIONS column.

The inputs and outputs of analysis modules are defined in the FORMAL_INPUTS and FORMAL_OUTPUTS tables, which specify a series of semantic types that are inputs and outputs for each module. Each entry in these tables refers back to the SEMANTIC_TYPES table. Formal inputs may be restricted to come from a small set or list of values — in this case, the values are given by entries in the LOOKUP_TABLES table.

Analysis modules are connected into DAG's via analysis chains, in the ANALYSIS_CHAINS table. An analysis chain contains a number of nodes (ANALYSIS_CHAIN_NODES), each of which specifies a module that might have its iterator and new feature tag overridden. The ANALYSIS_CHAIN_LINKS table specified links in the analysis chain, in terms of "from" and "to" nodes, along with output of the "from" node and input of the "to" node. Of course, links must be sensible — the semantic types of the "from" outputs and the "to" inputs must match, and the nodes must be members of the same analysis chain.

Analysis paths are specific paths through analysis chains. They are defined by entries in the ANALYSIS_PATHS and ANALYSIS_PATH_MAPS tables, which create paths and specify their contents, respectively.

To make this more concrete, lets look at analysis chain #1:

ome=# select * from analysis_chains where analysis_chain_id=1;
 analysis_chain_id | owner |         name          | description | locked 
-------------------+-------+-----------------------+-------------+--------
                 1 |     1 | Image import analyses |             | t
(1 row)

This chain has three nodes, corresponding to modules 5,6, and 7.

ome=# select analysis_chain_node_id, module_id from analysis_chain_nodes where analysis_chain_id=1;
 analysis_chain_node_id | module_id 
------------------------+-----------
                      3 |         7
                      2 |         5
                      1 |         6
(3 rows)

There are two links in this chain — one from node 3 to node 2, and another from node 3 to node 1.

ome=# select analysis_chain_link_id, from_node,to_node from analysis_chain_links where analysis_chain_id=1;
 analysis_chain_link_id | from_node | to_node 
------------------------+-----------+---------
                      2 |         3 |       2
                      1 |         3 |       1
(2 rows)

Thus, this chain is a DAG, starting from node 3 and progressing to either node 2 or node 1. This structure is made explicit in the ANALYIS_PATHS and ANALYSIS_PATH_MAPS tables. The ANALYSIS_PATHS table tells us that there are two paths of length 2 for this chain:

ome=# select * from analysis_paths where analysis_chain_id=1;
 path_id | path_length | analysis_chain_id 
---------+-------------+-------------------
       1 |           2 |                 1
       2 |           2 |                 1
(2 rows)

The ANALYSIS_PATH_MAP table tells us that the first element of the first path is 3 and the second element of that path is 1. Similarly, the first element of the second path is 3 and the second element is 2.

ome=# select * from analysis_path_map;
 path_id | path_order | analysis_chain_node_id 
---------+------------+------------------------
       1 |          0 |                      3
       1 |          1 |                      1
       2 |          0 |                      3
       2 |          1 |                      2
(4 rows)

The OME database tracks the exections of the modules and chains. The MODULE_EXECUTIONS table contains an entry for each "run" of a module against a dataset. ACTUAL_INPUTS tracks the inputs used for each run, and SEMANTIC_TYPE_OUTPUTS tracks the results of the run. The MODULE_EXECUTIONS table is use to cache results: if a module is about to be executed with the same inputs as a previous execution, the previous results will be re-used, and no new entry will be added to the MODULE_EXECUTIONS table.

The ANALYSIS_NODE_EXECUTIONS table tracks executions of a node in an analysis chain. Entries will be added to this table even when the results of the associated module are retrieved from the MODULES_EXECUTION cache. ANALYSIS_CHAIN_EXECUTIONS track the execution of entire chains.

The structures of the analysis chains, modules, etc. are re-capitulated in XML schema in AnalysisModule.xsd. This schema also includes stubs for features that are not yet implemented. For example, the Program element will eventually describe installation packages and scripts.

The CoreChains file in the ./src/SQL directory specifies three analysis chains that are loaded by the OME bootstrap script. These chains are all specified as instances of the AnalysisChains.xsd schema, and are found in the ./src/xml directory.

Code for a variety of analysis programs can be found in the ./src/C directory.

Document Actions

Print this

Sections

Personal tools

Analysis

Analysis: Modules, chains, nodes, and links

Document Actions