FAQ

This is a legacy page!

You probably want the current FAQ

Frequently Asked Questions about OME-XML in the legacy OME Server

Why do we need a new format? Why can't we just use TIFF?
In the standard application of the TIFF format, all meta-data is lost. The only thing left are the pixels themselves. TIFF provides a mechanism (custom tags) to add meta-data, but there would have to be agreement on the meta data, the tags, and their structure which is essentially what we've accomplished in XML.

The file format itself isn't new - XML is used everywhere, and its becoming ubiquitous in its use as a native file format. Even Microsoft's Office will soon switch its native format to XML. All we've really done is used an existing format to define a set of metadata that should be included to describe microscopy images. Much in the same spirit as MIAME/MAGE for microarrays (also XML).

In other words, this isn't a new file format - its a new set of meta-data.

On Dec 11, 2005, The LOCI Team announced a variant form of OME-XML, known as OME-TIFF. This format combines the metadata facilities of OME-XML with the standardized storage of binary data via TIFF. The new format is ideal for storing data for visualization and analysis.
Why use XML?
XML is being used everywhere to represent complex structured information. There are several reasons why everybody is jumping on the XML bandwagon. Principally, they are:
- XML is not binary, it is human readable plain text.
- There is a lot of open-source and off-the-shelf software that can parse XML and do interesting things with it that are not possible with other formats (XML databases, search tools, viewers, editors, etc.)
- The structure of XML documents is inherently open. There is little if any reverse-engineering necessary to understand a new XML format, even when there is no governing schema.
- XML Schema provides an unambiguous description of the document's structure which allows for validation, conformity testing and debugging using third-part tools.
- XML makes it straight forward to represent information that has an inherently complex semantic structure.
How can you represent binary image data in XML using plain text?
Binary data is represented in XML using Base64 encoding (See RFC 1341 Section 5.2). Binary data is a string of numbers in base 2 represented by a series of bits (binary digits). These same numbers can be represented in base 64 using a series of base 64 digits. The convention for representing base 64 digits is to use the upper and lower case letters (52 digits), the numbers (+10 digits), and two punctuation characters '+' and '/' (total = 64).

Since each byte can represent a value from 0 to 255, we would need two base 64 digits to represent it. If we did that, we would need twice the storage to convert binary into plain text. Instead we can convert 3 bytes at a time into 4-digit base 64 numbers. This makes the plain-text representation of binary data "only" 1/3 larger. This still seems too big, but read the next question.
Aren't XML files really huge?
The verbose nature of XML, the fact that its "human readable", and the Base64 encoding of binary data makes XML files much larger than their binary counterparts. However, there are two ways to get around this, which makes actual OME-XML files much smaller than the same images stored in other formats.

OME-XML specifies compression for the image pixels using one of two very common patent-free open-source compression schemes: gzip and bzip2. Image data is highly compressible by these algorithms: By a factor of 3-4 for gzip, and even higher for bzip2. So even though we're inflating the binary compressed data by 33% to represent it in plain text, we first shrink it to 1/4 or 1/3 of its original size. In order to preserve random access to image planes in OME-XML even when the image is 5-dimensional, we specify that each plane has to be compressed separately.

XML files are themselves highly compressible. Although this is more of a de-facto standard than an actual one, by convention most XML libraries will read and write XML files compressed with gzip transparently (i.e. they "just do it"). We recommend using the file extension ".ome" for uncompressed plain-text files, and ".omez" for gzip-compressed ones. This is exactly how the scalable vector graphics (SVG) format works.
What about performance? Doesn't all this compression make XML parsing even slower and more resource-intensive than it already is?
Decompression does put an additional burden on the CPU and it has to be done each time the file is read. Compression puts an even bigger burden on the CPU, but it only has to be done once. Increases in CPU speed have long outpaced increases in drive speed. A modern CPU spends a great deal of time waiting around for the next byte to be read from disk. Rather than twiddling its thumbs, it could be busy decompressing the bytes as they come off the disk instead. In actual practice, the difference between CPU speed and drive speed on a modern computer is so great that the CPU still spends a lot of its time waiting for the disk even when its decompressing what its reading at the same time. So we would argue that decompression is more resource-intensive, but more importantly, its more resource-efficient and can potentially result in faster read times.

There are open-source libraries available for parsing XML for C, C++, Java and others. These are high-performance libraries as good or better than some commercial implementations. Because of the broad support for XML and the availability and active development of parsers, any lingering performance issues with XML itself are likely to be soon resolved. Part of the advantage of using XML is that we can make use of the efforts of a much larger community than those involved in microscopy or even imaging in general.

Its important to note that our goal in developing and promoting OME-XML is and always has been interoperability. It was not necessarily intended as a native storage format for high-performance imaging systems. In fact our own OME software does not use this format natively for rapid random-access to pixels. In other words, if we did compromise some of the performance potential (and we don't believe we did very much if at all), it was done to preserve interoperability and adhere to accepted standards.
How do I get new kinds of meta-data incorporated into OME-XML?
This is one of the great strengths of XML. Its non-binary nature makes it very easily expandable without hurting backwards compatibility. OME-XML has a built-in mechanism for dynamically expanding and defining new meta-data. See the section on Semantic Types for more information and examples. In other words, new types of meta-data can be stored in OME-XML immediately without doing anything to the OME-XML Schema.

The existing mechanism for defining new meta-data has the inherent danger that everyone will use it to define very similar meta-data in their own special way. This is not completely avoidable, and can only be mitigated. One proposal is to establish a centralized repository of Semantic Types, so that at least the first step to defining new meta-data would be looking to see if someone else has already done it. Secondly, as similar or equivalent Semantic Types get used again and again by different manufacturers and imaging groups, we will attempt to form a consensus among those interested on the precise structure of this new meta-data and incorporate it into the schema. This new schema will have a new version, and every attempt will be made to mitigate any conflict with previous versions. We envision that this versioning process will occur rarely - at most once or maybe twice a year. This isn't because we're lazy, its because standards need to have stability and not become moving targets.

Document Actions

Print this

Sections

Personal tools

FAQ

This is a legacy page!

Frequently Asked Questions about OME-XML in the legacy OME Server

Document Actions