Semantic Types and Friends
Semantic types, semantic elements, data tables, and data columns
The concept of "semantic type" plays a key role in OME. A
semantic type is essentially a struct
or class
data element as implemented in the
database. The SEMANTIC_TYPES
table contains the
structures that are needed to define much of the data in the
database. Semantic types are linked to semantic elements,
which are the components that the semantic types are composed
of. The two tables are linked by
the SEMANTIC_TYPE_ID
key.
Semantic types have four levels of granularity:
- Global (
G
) - Applies to the entire database
- Dataset (
D
) - Specific to a given dataset
- Image (
I
) - Specific to a given image
- Feature (
F
) - Specific to a feature
An example might help make this abstract discussion somewhat more clear.
If we look at semantic_type #8, we get the following output:
ome=# select * from semantic_types where semantic_type_id=8; semantic_type_id | name | granularity | description ------------------+------------+-------------+------------- 8 | Experiment | G | (1 row)
This tells us that Experiment
is a global
semantic type, with id #8. Looking at the elements associated
with Experiment
, we see the following:
ome=# select * from semantic_elements where semantic_type_id=8; semantic_element_id | semantic_type_id | name | data_column_id | description ---------------------+------------------+--------------+----------------+------------- 23 | 8 | Experimenter | 23 | 22 | 8 | Description | 22 | 21 | 8 | Type | 21 | (3 rows)
Thus, the Experiment
type has three elements —
experimenter, description, and type.
The DATA_COLUMN_ID
column in
the SEMANTIC_ELEMENTS
table plays a key role in
the translation between the abstract definitions of these
semantic types and their implementation in the database.
Specifically, DATA_COLUMN_ID
is a reference to an
entry in the DATA_COLUMNS_TABLE
. This entry
describes the instantiation of the element in the database.
Following up on our example, the entry
for data_column_id = 23
is as follows:
ome=# select * from data_columns where data_column_id=23; data_column_id | data_table_id | column_name | description | sql_type | reference_type ----------------+---------------+--------------+-------------+-----------+---------------- 23 | 8 | EXPERIMENTER | | reference | Experimenter (1 row)
Thus, in the database, the semantic element field with ID 23
is implemented as a reference to the
table Experimenter
- a foreign key.
Note the DATA_TABLE_ID
field in
the DATA_COLUMNS
table. This field points to the
entries in the DATA_TABLES
table that implements
the semantic type. Thus, if we find all of the columns in
DATA_COLUMNS
with data_table_id = 8
,
we see the three semantic elements for this semantic type:
ome=# select * from data_columns where data_table_id=8; data_column_id | data_table_id | column_name | description | sql_type | reference_type ----------------+---------------+--------------+-------------+-----------+---------------- 23 | 8 | EXPERIMENTER | | reference | Experimenter 22 | 8 | DESCRIPTION | | string | 21 | 8 | TYPE | | string | (3 rows)
Finally, if we look at the entry in DATA_TABLES
where
data_table_id = 8
, we see:
data_table_id | granularity | table_name | description ---------------+-------------+-------------+------------- 8 | G | EXPERIMENTS | (1 row)
The entries in DATA_COLUMNS
and DATA_TABLES
are sufficient to generate the
tables that store the actual data:
ome=# select * from experiments; attribute_id | module_execution_id | type | description | experimenter --------------+---------------------+------+-------------+-------------- (0 rows)
Each table has an ATTRIBUTE_ID
field
(essentially, a unique id # for each entry in that table),
a MODULE_EXECUTION_ID
(more about that later),
and all of the fields defined in the associated data columns.
This is a fairly powerful and flexible structure: these tables provide all of the information needed to construct the database schema that is used to store the "real" OME data. This is what we mean when we say that the database is very "meta".
A closer look at this structure reveals some potentially
troubling parallels: SEMANTIC_TYPES
and DATA_TABLES
, look very similar, as
do SEMANTIC_ELEMENTS
and DATA_COLUMNS
. Is this duplication necessary?
This question can be answered by noting that the parallels are
not exact. Semantic types and semantic elements define the
abstract model of the data, while the data columns and tables
define one particular realization of those abstract types.
The DATA_COLUMN_ID
in
the SEMANTIC_ELEMENTS
table defines the link
between this abstract model and its concrete realization.
This separation allows us to move away from a strictly
one-to-one mapping between semantic types and data tables. For
example, the semantic types PlaneMean
and
PlaneGeometricMean
all refer back to entries in
the table PLANE_STATISTICS
.