Skip to content

Commit

Permalink
Merge branch 'main' into doc/omero
Browse files Browse the repository at this point in the history
  • Loading branch information
giovp committed Jul 6, 2023
2 parents ef4e605 + b7359aa commit bb69bc5
Show file tree
Hide file tree
Showing 12 changed files with 194 additions and 86 deletions.
1 change: 1 addition & 0 deletions .github/workflows/review.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ jobs:
issue-number: ${{ github.event.pull_request.number }}
body: |
#### Automated Review URLs
* [Readthedocs](https://ngff--${{ github.event.pull_request.number }}.org.readthedocs.build/)
* [render latest/index.bs](http://api.csswg.org/bikeshed/?url=https://raw.githubusercontent.com/ome/ngff/${{ github.event.pull_request.head.sha }}/latest/index.bs)
* [diff latest modified](https://services.w3.org/htmldiff?doc1=https%3A%2F%2Fngff.openmicroscopy.org%2Flatest%2F&doc2=http%3A%2F%2Fapi.csswg.org%2Fbikeshed%2F%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fome%2Fngff%2F${{ github.event.pull_request.head.sha }}%2Flatest%2Findex.bs)
edit-mode: replace
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
[![DOI](https://zenodo.org/badge/313652456.svg)](https://zenodo.org/badge/latestdoi/313652456)

# ome-ngff

[Next-generation file format (NGFF) specifications](https://ngff.openmicroscopy.org/latest/) for storing bioimaging data in the cloud.
Expand Down
52 changes: 51 additions & 1 deletion about/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,56 @@
About
=====

Bioimaging science is at a crossroads. Currently, the drive to acquire more,
larger, preciser spatial measurements is unfortunately at odds with our ability
to structure and share those measurements with others. During a global pandemic
more than ever, we believe fervently that global, collaborative discovery as
opposed to the post-publication, "data-on-request" mode of operation is the
path forward. Bioimaging data should be shareable via open and commercial cloud
resources without the need to download entire datasets.

At the moment, that is not the norm. The plethora of data formats produced by
imaging systems are ill-suited to remote sharing. Individual scientists
typically lack the infrastructure they need to host these data themselves. When
they acquire images from elsewhere, time-consuming translations and data
cleaning are needed to interpret findings. Those same costs are multiplied when
gathering data into online repositories where curator time can be the limiting
factor before publication is possible. Without a common effort, each lab or
resource is left building the tools they need and maintaining that
infrastructure often without dedicated funding.

This document defines a specification for bioimaging data to make it possible
to enable the conversion of proprietary formats into a common, cloud-ready one.
Such next-generation file formats layout data so that individual portions, or
"chunks", of large data are reference-able eliminating the need to download
entire datasets.


Why "NGFF"?
-----------

A short description of what is needed for an imaging format is "a hierarchy
of n-dimensional (dense) arrays with metadata". This combination of features
is certainly provided by `HDF5`
from the [HDF Group](https://www.hdfgroup.org), which a number of
bioimaging formats do use. HDF5 and other larger binary structures, however,
are ill-suited for storage in the cloud where accessing individual chunks
of data by name rather than seeking through a large file is at the heart of
parallelization.

As a result, a number of formats have been developed more recently which provide
the basic data structure of an HDF5 file, but do so in a more cloud-friendly way.
In the [PyData](https://pydata.org/) community, the Zarr [[zarr]] format was developed
for easily storing collections of [NumPy](https://numpy.org/) arrays. In the
[ImageJ](https://imagej.net/) community, N5 [[n5]] was developed to work around
the limitations of HDF5 ("N5" was originally short for "Not-HDF5").
Both of these formats permit storing individual chunks of data either locally in
separate files or in cloud-based object stores as separate keys.

An [updated Zarr version (v3)](https://zarr-specs.readthedocs.io/)
is underway to unify the two similar specifications to provide a single binary
specification. See this [blog post](https://zarr.dev/blog/zep1-update/) for more information.

In addition to the next-generation file format (NGFF) [specifications](../specifications/index.md),
the pages listed below are intended to provide an overview of external resources available
for working with NGFF data.
Expand All @@ -12,7 +62,7 @@ The following pages are intended to provide an overview of the available resourc
* [Publications](../publications/index.md): List of publications referencing OME-NGFF or publishing
datasets in OME-Zarr.

Additionally, notes and recordings of the passt NGFF community calls are available:
Additionally, notes and recordings of the past NGFF community calls are available:

| Call | Date | Presenters | Forum thread | Notes |
|------|------|------------|--------------|-------|
Expand Down
27 changes: 14 additions & 13 deletions data/index.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,20 @@
Data Resources
==============

| Catalog | Descriptions | Zarr Files | Size |
| ------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------- | ------------ | ------- |
| [BIA Samples](https://bit.ly/bia-ome-ngff-samples) | Hosting provided by EBI | 90 | 200GB |
| [CZB-Zebrahub](https://zebrahub.ds.czbiohub.org/imaging) | Hosting provided by czbiohub | 5 | 1.2TB |
| [Glencoe](https://glencoesoftware.com/ngff) | Hosting provided by Glencoe Software, Inc. | TBD | TBD |
| [DANDI](https://dandiarchive.org/dandiset/000108) ([identifiers.org][dandi2],[github][dandi3]) | Hosting provided by AWS Open Data Program | 3914 | 355TB |
| [EMBL-HD](https://mobie.github.io/specs/ngff.html) | Hosting provided by EMBL | 21 | TBD |
| [IDR Samples](https://idr.github.io/ome-ngff-samples/) | Hosting provided by EBI | 88 | 3TB |
| [Neural Dynamics](https://registry.opendata.aws/allen-nd-open-data/) | Hosting provided by AWS Open Data Program | 90 | 200TB |
| [Sanger](https://www.sanger.ac.uk/project/ome-zarr/) | Hosting provided by Sanger, UK | 10 | 1TB |
| [SpatialData](https://github.com/scverse/spatialdata-notebooks/tree/main/datasets) | Hosting provided by EMBL | 10 | 25GB |
| [webKnossos](https://zarr.webknossos.org) | Hosting provided by scalableminds GmbH | 69 | 70TB |
| [SSBD](https://ssbd.riken.jp/ssbd-ome-ngff-samples) | Hosting provided by SSBD | 12 | 196GB |
| Catalog | Hosting | Zarr Files | Size |
| ------------------------------------------------------------------------ | -----------------------------------------------------| ------------ | -------- |
| [BIA Samples](https://bit.ly/bia-ome-ngff-samples) | EBI | 90 | 200 GB |
| [Cell Painting Gallery](https://github.com/broadinstitute/cellpainting-gallery) | AWS Open Data Program | 136 | 20 TB |
| [CZB-Zebrahub](https://zebrahub.ds.czbiohub.org/imaging) | czbiohub | 5 | 1.2 TB |
| [DANDI](https://dandiarchive.org/dandiset/000108) ([identifiers.org][dandi2],[github][dandi3]) | AWS Open Data Program | 3914 | 355 TB |
| [Glencoe](https://glencoesoftware.com/ngff) | Glencoe Software, Inc. | 8 | 165 GB |
| [IDR Samples](https://idr.github.io/ome-ngff-samples/) | EBI | 88 | 3 TB |
| [MoBIE](https://mobie.github.io/specs/ngff.html) | EMBL-HD | 21 | 2 TB |
| [Neural Dynamics](https://registry.opendata.aws/allen-nd-open-data/) | AWS Open Data Program | 90 | 200 TB |
| [Sanger](https://www.sanger.ac.uk/project/ome-zarr/) | Sanger, UK | 10 | 1 TB |
| [SpatialData](https://github.com/scverse/spatialdata-notebooks/tree/main/datasets) | EMBL-HD | 10 | 25 GB |
| [SSBD](https://ssbd.riken.jp/ssbd-ome-ngff-samples) | SSBD | 12 | 196 GB |
| [webKnossos](https://zarr.webknossos.org) | scalableminds GmbH | 69 | 70 TB |

[dandi2]: https://identifiers.org/DANDI:000108
[dandi3]: https://github.com/dandisets/000108
11 changes: 8 additions & 3 deletions index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,14 @@
Next-generation file formats (NGFF)
===================================

OME-NGFF is an imaging format being developed by the bioimaging community to
address issues of scalability and interoperability.
Please see the :doc:`about/index` section for an introduction.
The OME-NGFF specification is detailed under :doc:`specifications/index`.
Various Image viewers and other software for working with NGFF data
are listed on the :doc:`tools/index` page.
Sample NGFF datasets provided by the community can be found under :doc:`data/index`.

.. toctree::
:maxdepth: 2

Expand All @@ -20,6 +28,3 @@ Next-generation file formats (NGFF)
.. raw:: html

<script type="text/javascript">
window.location.replace('latest/index.html');
</script>
61 changes: 5 additions & 56 deletions latest/index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -26,60 +26,6 @@ Status Text: will be provided between numbered versions. Data written with these
Status Text: (an "editor's draft") will not necessarily be supported.
</pre>

Introduction {#intro}
=====================

Bioimaging science is at a crossroads. Currently, the drive to acquire more,
larger, preciser spatial measurements is unfortunately at odds with our ability
to structure and share those measurements with others. During a global pandemic
more than ever, we believe fervently that global, collaborative discovery as
opposed to the post-publication, "data-on-request" mode of operation is the
path forward. Bioimaging data should be shareable via open and commercial cloud
resources without the need to download entire datasets.

At the moment, that is not the norm. The plethora of data formats produced by
imaging systems are ill-suited to remote sharing. Individual scientists
typically lack the infrastructure they need to host these data themselves. When
they acquire images from elsewhere, time-consuming translations and data
cleaning are needed to interpret findings. Those same costs are multiplied when
gathering data into online repositories where curator time can be the limiting
factor before publication is possible. Without a common effort, each lab or
resource is left building the tools they need and maintaining that
infrastructure often without dedicated funding.

This document defines a specification for bioimaging data to make it possible
to enable the conversion of proprietary formats into a common, cloud-ready one.
Such next-generation file formats layout data so that individual portions, or
"chunks", of large data are reference-able eliminating the need to download
entire datasets.


Why "<dfn export="true"><abbr title="Next-generation file-format">NGFF</abbr></dfn>"? {#why-ngff}
-------------------------------------------------------------------------------------------------

A short description of what is needed for an imaging format is "a hierarchy
of n-dimensional (dense) arrays with metadata". This combination of features
is certainly provided by <dfn export="true"><abbr title="Hierarchical Data Format 5">HDF5</abbr></dfn>
from the <a href="https://www.hdfgroup.org">HDF Group</a>, which a number of
bioimaging formats do use. HDF5 and other larger binary structures, however,
are ill-suited for storage in the cloud where accessing individual chunks
of data by name rather than seeking through a large file is at the heart of
parallelization.

As a result, a number of formats have been developed more recently which provide
the basic data structure of an HDF5 file, but do so in a more cloud-friendly way.
In the [PyData](https://pydata.org/) community, the Zarr [[zarr]] format was developed
for easily storing collections of [NumPy](https://numpy.org/) arrays. In the
[ImageJ](https://imagej.net/) community, N5 [[n5]] was developed to work around
the limitations of HDF5 ("N5" was originally short for "Not-HDF5").
Both of these formats permit storing individual chunks of data either locally in
separate files or in cloud-based object stores as separate keys.

A [current effort](https://zarr-specs.readthedocs.io/en/core-protocol-v3.0-dev/protocol/core/v3.0.html)
is underway to unify the two similar specifications to provide a single binary
specification. The editor's draft will soon be entering a [request for comments (RFC)](https://github.com/zarr-developers/zarr-specs/issues/101) phase with the goal of having a first version early in 2021. As that
process comes to an end, this document will be updated.

OME-NGFF {#ome-ngff}
--------------------

Expand Down Expand Up @@ -380,7 +326,7 @@ Each "multiscales" dictionary MAY contain the field "coordinateTransformations",
The transformations MUST follow the same rules about allowed types, order, etc. as in "datasets:coordinateTransformations" and are applied after them.
They can for example be used to specify the `scale` for a dimension that is the same for all resolutions.

Each "multiscales" dictionary SHOULD contain the field "name". It SHOULD contain the field "version", which indicates the version of the multiscale metadata of this image (current version is [NGFFVERSION]).
Each "multiscales" dictionary SHOULD contain the field "name". It MUST contain the field "version", which indicates the version of the multiscale metadata of this image (current version is [NGFFVERSION]).

Each "multiscales" dictionary SHOULD contain the field "type", which gives the type of downscaling method used to generate the multiscale image pyramid.
It SHOULD contain the field "metadata", which contains a dictionary with additional information about the downscaling method.
Expand Down Expand Up @@ -546,7 +492,7 @@ contain only alphanumeric characters, MUST be case-sensitive, and MUST NOT be a
other `name` in the `rows` list. Care SHOULD be taken to avoid collisions on
case-insensitive filesystems (e.g. avoid using both `Aa` and `aA`).

The `plate` dictionary SHOULD contain a `version` key whose value MUST be a string specifying the
The `plate` dictionary MUST contain a `version` key whose value MUST be a string specifying the
version of the plate specification.

The `plate` dictionary MUST contain a `wells` key whose value MUST be a list of JSON objects
Expand Down Expand Up @@ -646,6 +592,9 @@ Projects which support reading and/or writing OME-NGFF data include:
<dt><strong>[vizarr](https://github.com/hms-dbmi/vizarr/)</strong></dt>
<dd>A minimal, purely client-side program for viewing Zarr-based images with Viv & ImJoy.</dd>

<dt><strong>[ITKIOOMEZarrNGFF](https://github.com/InsightSoftwareConsortium/ITKIOOMEZarrNGFF/)</strong></dt>
<dd>ITK IO for images stored in OME-NGFF format.</dd>

</dl>

<img src="https://downloads.openmicroscopy.org/presentations/2020/Dundee/Workshops/NGFF/zarr_diagram/images/zarr-ome-diagram.png" alt="Diagram of related projects"></img>
Expand Down
2 changes: 1 addition & 1 deletion latest/schemas/image.schema
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
}
},
"required": [
"datasets", "axes"
"datasets", "axes", "version"
]
},
"minItems": 1,
Expand Down
2 changes: 1 addition & 1 deletion latest/schemas/plate.schema
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@
}
},
"required": [
"columns", "rows", "wells"
"columns", "rows", "wells", "version"
]
}
}
Expand Down
41 changes: 39 additions & 2 deletions latest/tests/image_suite.json
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@
"valid": true
},
{
"formerly": "valid/missing_version.json",
"formerly": "invalid/missing_version.json",
"description": "TBD",
"data": {
"@type": "ngff:Image",
Expand Down Expand Up @@ -126,7 +126,7 @@
}
]
},
"valid": true
"valid": false
},
{
"formerly": "valid/invalid_axis_units.json",
Expand Down Expand Up @@ -857,6 +857,43 @@
},
"valid": false
},
{
"formerly": "invalid/missing_version.json",
"description": "TBD",
"data": {
"multiscales": [
{
"axes": [
{
"name": "y",
"type": "space",
"unit": "micrometer"
},
{
"name": "x",
"type": "space",
"unit": "micrometer"
}
],
"datasets": [
{
"path": "0",
"coordinateTransformations": [
{
"scale": [
1,
1
],
"type": "scale"
}
]
}
]
}
]
},
"valid": false
},
{
"formerly": "invalid/invalid_axis_type.json",
"description": "TBD",
Expand Down
Loading

0 comments on commit bb69bc5

Please sign in to comment.