In 2020, we published two blog posts reporting on the development of next-generation file formats (NGFFs). We announced public specifications and shared public examples of multiscale images and label images in November 2020 and high-content screening (HCS) datasets in December 2020.
Here we report progress made on OME-NGFF over the course of 2021 with contributions from OME, Glencoe Software, EMBL and Harvard Medical School.
The principle of next-generation file formats as a solution for bioimaging data storage and access was recently published as a Brief Communication in Nature Methods. This work is driven by use cases in image data publication and domains with large bioimaging datasets– public data resources, high content screening and light sheet microscopy with direct contributions from all the groups mentioned above. We have tried to make the work as widely applicable as possible, while trying to deliver examples of real world solutions.
This paper includes a benchmark which measures the access times for different types of data chunks. The reading performance of NGFFs is compared to the established TIFF and HDF5 format for different imaging modalities in different types of storage. These results demonstrate the benefits of the different formats under different access scenarios, e.g., on a local computer, in the cloud, etc.. We discuss the advantages and tradeoffs of each format in various contexts. We believe there is no single format that provides optimal performance in all scenarios. The corollary to this is that any imaging data-intensive project will have to consider these issues and make informed choices regarding the data structures they use. See the “Outlook” section of the paper for more details.
Preliminary results for the latency benchmark of the OME-NGFF paper revealed
performance issues when accessing Zarr chunks remotely for some modalities.
Our tests showed that the source of these issues was related to the separator
used between chunks in the Zarr format. The version
0.2 includes a backwards incompatible
change to use slash (/
) as the separator between individual chunks rather
than dot (.
). This separator must be used for all multiscale images of all
modalities, including label images and high-content screening fields of views.
Moving forward, this restriction will be eased, but the underlying libraries
like Zarr will be moving to /
as the default.
Until version 0.2, the axes of multiscales images were implicitly assumed to
be XYZCT. The 0.3 version of the
OME-NGFF specification loosened this requirement by introducing a mandatory
axes attribute in the multiscales
specification. This extends dimensionality
between 2D and 5D. For instance, it is possible to store a time-lapse 2D image
using XYT or a three-dimensional volume as XYZ.
The two features discussed above (dimension separator and 2D-5D axes) are published in the latest OME-NGFF specification, currently at version 0.3. We generated a comprehensive set of 0.3 OME-NGFF samples to cover all the current set of specifications:
Image | Study | Axes | Dimensions | Others | Viewer | S3-endpoint |
---|---|---|---|---|---|---|
idr0079 | XYZC | 1584 x 788 x 142 x 2 | labels (0) | view | ||
idr0052 | XYZCT | 256 x 256 x 31 x 3 x 40 | labels (cells, chromosomes) | view | ||
idr0095 | XYC | 2048 x 2048 x 3 | labels (0) | view | ||
idr0095 | XYC | 2048 x 2048 x 3 | labels (0) | view | ||
idr0095 | XYC | 2048 x 2048 x 3 | labels (0) | view | ||
idr0095 | XYC | 2048 x 2048 x 3 | labels (0) | view | ||
idr0095 | XYC | 2048 x 2048 x 3 | labels (0) | view | ||
idr0095 | XYC | 2048 x 2048 x 3 | labels (0) | view | ||
idr0109 | XYT | 2560 x 2160 x 721 | view | |||
idr0075 | XYZ | 512 x 512 x 91 | view | |||
idr0051 | XYZT | 333 x 333 x 201 x 79 | view | |||
idr0040 | XYCT | 2048 x 2048 x 5 x 20 | view | |||
idr0094 | XY | 1080 x 1080 | 96 wells | view |
— December 16, 2021