Analysis
Analysis: Modules, chains, nodes, and links
Automated image analysis is a significant part of why we bother with OME. The use of analytic software to extract information from images is an active area of ongoing research. In many cases, these analyses feed upon each other — the output of one being the input of the next. Construction of useful sequences of analyses may be an important outcome of work with OME. Thus, as you might expect, the OME database contains a rich body of information about analysis programs and their utilization.
The basic unit of analysis in OME is a module. Modules are
categorized by hierarchical categories (stored in the
MODULE_CATEOGIRES
table). Modules also have types, which
refer to Perl classes that implement the
OME::Analysis::Handler
interface, and locations -
which can be filenames for command-line executables or Perl
classes. The EXECUTION_INSTRUCTIONS
column contains an XML
fragment that defines how the module is to be invoked. Modules
can iterate over features in images — the module's
DEFAULT_ITERATOR
specifies how this is done.
Finally, the module's NEW_FEATURE_TAG
defines a
tag that will be assigned to any new features created by the
module.
For modules that are command-line executables,
the CLI.xsd schema defines a template for specifying
the structure of the inputs and outputs. Instances of this
schema will be the contents of
the EXECUTION_INSTRUCTIONS
column.
The inputs and outputs of analysis modules are defined in the
FORMAL_INPUTS
and FORMAL_OUTPUTS
tables, which specify a series of semantic types that are
inputs and outputs for each module. Each entry in these
tables refers back to the SEMANTIC_TYPES
table.
Formal inputs may be restricted to come from a small set or
list of values — in this case, the values are given by entries
in the LOOKUP_TABLES
table.
Analysis modules are connected into DAG's via analysis chains,
in the ANALYSIS_CHAINS
table. An analysis chain
contains a number of nodes
(ANALYSIS_CHAIN_NODES
), each of which specifies a
module that might have its iterator and new feature tag
overridden. The ANALYSIS_CHAIN_LINKS
table
specified links in the analysis chain, in terms of "from" and
"to" nodes, along with output of the "from" node and input of
the "to" node. Of course, links must be sensible — the
semantic types of the "from" outputs and the "to" inputs must
match, and the nodes must be members of the same analysis
chain.
Analysis paths are specific paths through
analysis chains. They are defined by entries in the
ANALYSIS_PATHS
and ANALYSIS_PATH_MAPS
tables, which create paths
and specify their contents, respectively.
To make this more concrete, lets look at analysis chain #1:
ome=# select * from analysis_chains where analysis_chain_id=1; analysis_chain_id | owner | name | description | locked -------------------+-------+-----------------------+-------------+-------- 1 | 1 | Image import analyses | | t (1 row)
This chain has three nodes, corresponding to modules 5,6, and 7.
ome=# select analysis_chain_node_id, module_id from analysis_chain_nodes where analysis_chain_id=1; analysis_chain_node_id | module_id ------------------------+----------- 3 | 7 2 | 5 1 | 6 (3 rows)
There are two links in this chain — one from node 3 to node 2, and another from node 3 to node 1.
ome=# select analysis_chain_link_id, from_node,to_node from analysis_chain_links where analysis_chain_id=1; analysis_chain_link_id | from_node | to_node ------------------------+-----------+--------- 2 | 3 | 2 1 | 3 | 1 (2 rows)
Thus, this chain is a DAG, starting from node 3 and
progressing to either node 2 or node 1. This structure is
made explicit in the ANALYIS_PATHS
and ANALYSIS_PATH_MAPS
tables.
The ANALYSIS_PATHS
table tells us that there are
two paths of length 2 for this chain:
ome=# select * from analysis_paths where analysis_chain_id=1; path_id | path_length | analysis_chain_id ---------+-------------+------------------- 1 | 2 | 1 2 | 2 | 1 (2 rows)
The ANALYSIS_PATH_MAP
table tells us that the
first element of the first path is 3 and the second element of
that path is 1. Similarly, the first element of the second
path is 3 and the second element is 2.
ome=# select * from analysis_path_map; path_id | path_order | analysis_chain_node_id ---------+------------+------------------------ 1 | 0 | 3 1 | 1 | 1 2 | 0 | 3 2 | 1 | 2 (4 rows)
The OME database tracks the exections of the modules and
chains. The MODULE_EXECUTIONS
table contains an entry for
each "run" of a module against a dataset. ACTUAL_INPUTS
tracks the inputs used for each run, and
SEMANTIC_TYPE_OUTPUTS
tracks the results of the run. The
MODULE_EXECUTIONS
table is use to cache results:
if a module is about to be executed with the same inputs as a
previous execution, the previous results will be re-used, and
no new entry will be added to
the MODULE_EXECUTIONS
table.
The ANALYSIS_NODE_EXECUTIONS
table tracks
executions of a node in an analysis chain. Entries will be
added to this table even when the results of the associated
module are retrieved from the MODULES_EXECUTION
cache. ANALYSIS_CHAIN_EXECUTIONS
track the
execution of entire chains.
The structures of the analysis chains, modules, etc. are
re-capitulated in XML schema in AnalysisModule.xsd.
This schema also includes stubs for features that are not yet
implemented. For example, the Program
element
will eventually describe installation packages and scripts.
The CoreChains file in the ./src/SQL directory specifies three analysis chains that are loaded by the OME bootstrap script. These chains are all specified as instances of the AnalysisChains.xsd schema, and are found in the ./src/xml directory.
Code for a variety of analysis programs can be found in the ./src/C directory.