2013-11-06 OMERO.features Google Hangout (15:00 GMT / 10:00 EST)
Chris Coletta, Lee Kamentsky, Simon Li, Ivan Cao-Berg (15:30)
S: Last time we more or less agreed on the idea of storing features in a 2D array with metadata in the form of key-value annotations.
L: HDF5 should work as a storage mechanism, supports key-value attributes on arrays.
C: Could we store non-metadata as attachments, e.g. classification results, probabilities, etc?
L: Add a column to the table.
C: Scalability considerations- minimise I/O. Do we have a separate table file per image, per ROI, or group the features for all ROIs in one table? SciDB?
S: Two issues to consider, implementation in OMERO, and how we transfer the features between (non-OMERO) systems.
C: Can tables be related to each other, e.g. multiple versions of features for the same image/ROI, if so is it up to the user to work out which one to use?
L: Have a relational style DB, glob of related tables fitting a specified criteria.
L: Should be the user responsibility to deal with the results. If they make a query that has 15 matching tables all should be returned.
S: OMERO can handle the querying logic, in what format should the results be returned? Needs to include sufficient metadata to distinguish between the 15 tables.
L: Hibernate style object graph? Blob wrapping HDF5?
C: A Numpy matrix. Or Blaze, next generation Numpy, allows multiple non-fixed dimensions.
L: HDF5 is a backing store for Numpy, arranges chunks of data optimally, supports sparseness.
C: So OMERO.features needs to return multiple objects- key-value map(s), Numpy matrix. Related: will be attending PyData conference in New York, others might wish to go.
L: Likes the idea of using a ROI ID as an identifier and keeping the details of the ROI/image/etc separate from the features. Current state of the ROI-specification work? Would it cope with HCS data?
S: On-hold pending grant. In the meantime there’s scope for us to provide requirements or suggests changes.
L: Consider CellH5. Will talk to Christoph Sommer.
S: Are we going to define some key-value pairs? Are they attached to only tables, or also rows and columns?
L: HDF5 only supports per-table. Rows annotations could be done with another column, column annotations would require a separate table where each row is the metadata for a column.
S: OMERO should be able to handle returning multiple keys/values/tables.
I: Any restrictions on keys? Everyone has different requirements.
L: Metadata in OMERO is good, at the feature level most people don’t care. Attempting to standardise feature names is hard e.g. mean intensity could be normalised before/after calculations, areas could be interpolated at boundaries. Maybe have some restrictions, e.g. Image channel, ROI ID.
C: Have a standard set of base-level keys. If metadata is stored alongside the features in the same table (e.g. ROI ID) need to know which columns are features and which are metadata.
L: Describe using a key-value pair with a standard name, or define a datatype for the column.
S: Something like location could be both metadata and a feature. Spend the next week or two thinking about the standard key-value pairs and column-types we need.
I: Can you store an array of features in a single cell?
L: 2D array would be nice.
S: Discussion to be continued on email list.