Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge ADO latest main into github (#17)
* Merged PR 6148: Services that create pipelines * `OnemlProcessorsPipelineOperationsServices.COLLECTION_TO_DICT` returns a pipeline with an in_collection and an output that exposes the collection as a dictionary. * `OnemlProcessorsPipelineOperationsServices.DICT_TO_COLLECTION` returns a pipeline with a dictionary input and an out_collection the entries of the input dictionary as collection entries. * `OnemlProcessorsPipelineOperationsServices.DUPLICATE_PIPELINE` takes a pipeline and returns a pipeline with multiple copies of it. * `OnemlProcessorsPipelineOperationsServices.EXPOSE_GIVEN_OUTPUTS` takes data (dict of outputs, dict of dict of out collections) and creates a pipeline that exposes that data as output. * `OnemlProcessorsPipelineOperationsServices.EXPOSE_PIPELINE_AS_OUTPUT` takes a pipeline, returns an identical pipeline except it has an additional output exposing the given pipeline. * `OnemlProcessorsPipelineOperationsServices.LOAD_INPUTS_SAVE_OUTPUTS` takes a pipeline with an in_collection called `inputs_to_load` and an out_collection called `outputs_to_save` and returns a pipeline that loads the requested inputs from uris, passes them to the original pipeline, then saves the requested outputs to uris. * `OnemlHabitatsPipelineOperationsServices.PUBLISH_OUTPUTS_AS_DATASET` takes a pipeline that loads and saves to/from uris and returns a pipeline that loads from datasets and publishes a dataset. * Merged PR 6177: allow from oneml.habitats.immunocli import OnemlHabitatsCliDiContainer b/c cl... allow from oneml.habitats.immunocli import OnemlHabitatsCliDiContainer b/c cli di containers should be exposed * Merged PR 6178: fix service method and test that the service is added * Merged PR 6184: fixes to https://immunomics.visualstudio.com/Immunomics/_git/oneml/pullrequest/6148 * Merged PR 6187: Generic pipelines with two arguments only * Merged PR 6193: Minor fix regarding instantiation order of namedcollection. * Fix. * Tetst. * Merged PR 6206: test load_inputs_save_outputs when pipeline already has input_uris and output... test load_inputs_save_outputs when pipeline already has input_uris and output_uris * Merged PR 6225: fix services in oneml.habitats.pipeline_operations._datasets fix services in oneml.habitats.pipeline_operations._datasets * Merged PR 6232: remove furl from oneml.habitats.pipeline_operations b/c it lower-cases datase... remove furl from oneml.habitats.pipeline_operations b/c it lower-cases dataset name in ampds:// * Merged PR 6234: fix register io for Manifestaon abfss * Merged PR 6236: fix register io for Manifest on abfss * Merged PR 6239: fixed ComputeNodeBasedOutputUriProcessor intoducing uris with empty path segments * Merged PR 6231: Support pipeline providers in YAML * Refactors plugins into every component. * Adds hydra registry for pipeline providers. * Adds two_diamond tests building a pipeline in YAML using a python pipeline provider. * Merged PR 6255: expose BLOB_READ_USING_LOCAL_CACHE_FACTORY and BLOB_WRITE_USING_LOCAL_CACHE_F... expose BLOB_READ_USING_LOCAL_CACHE_FACTORY and BLOB_WRITE_USING_LOCAL_CACHE_FACTORY for use by other packages * Merged PR 6256: Add ability to duplicate pipeliens in YAML. * Adds `DuplicatePipelineConf` to be able to generate duplicate pipelines in YAML. * Adds three diamond as test case. * Merged PR 6250: oneml.processors.registry of pipeline providers * Merged PR 6279: 📝 (docs) Update documentation. 📝 (docs) Update documentation. * Merged PR 6286: fix container name in DatasetBlobStoreBaseLocationService DatasetBlobStoreBaseLocationService provides the blob locations for datasets published by oneml. It should read these from an installation level configuration, but at the moment has them hard coded. The container name used from production and non-production datasets outside notebooks is wrong. This PR hopefully fixes it. * Merged PR 6288: Fix dataset blob location * Merged PR 6293: fix get_relative_path when base_uri ends with / fix get_relative_path when base_uri ends with / * Merged PR 6298: 📝 Update docs. 📝 Update docs. * Merged PR 6311: Service inheritance test add test that verifies mypy screams when a service id is associated with a service that is a super-type of the declared service type * Merged PR 6310: 🚸 Optional inputs when combining pipelines can be dropped and add tests. 🚸 Refactor pipeline input validation for optional inputs and add tests. * Merged PR 6312: ✨ Add pipeline drop_inputs and drop_outputs and tests ✨ Add pipeline drop_inputs and drop_outputs methods * Merged PR 6322: test publish_outputs_as_dataset when there are no input_uris test publish_outputs_as_dataset when there are no input_uris * Merged PR 6331: Add support for symbolic links in uris Datasets published by oneml pipelines have a manifest.json at their root that maps output names to relative paths within the dataset. Pipeline builders can read from uris using `OnemlProcessorsIoServices.READ_FROM_URI_PIPELINE_BUILDER`, which by itself uses `ReadFromUriProcessor`, that takes a uri and returns the read object. This PR adds the following semantic to uris: If the uri has a fragment (i.e. something that follows #) then it is assumed that (after removing the fragment) the uri points to a json file. The fragment is assumed to be a dot-seprated hierarchical key into the json. The value associated with the key should be either a relative path from the directory of the json, or an absolute uri. Examples: if `file:///path1/path2/index.json` holds: ``` links: rel: path3/array.npy abs: ampds://mydataset/ another_abs: ampds://mydataset/manifest.json#entry_uris.container1?namespace=mynamespace ``` and `ampds://mydataset/manifest.json` holds: ``` entry_uris: container1: containers/container1 ``` Then: * `file:///path1/path2/index.json#links.rel` becomes `file:///path1/path2/path3/array.npy` * `file:///path1/path2/index.json#links.abs` becomes `ampds://mydataset/` * `file:///path1/path2/index.json#links.another_abs` becomes `ampds://mydataset/containers/container1?namespace=mynamespace` The semantics are implemented in `ReadFromUriProcessor` and therefore applicable only to pipelines built directly or indirectly using `OnemlProcessorsIoServices.READ_FROM_URI_PIPELINE_BUILDER`. * Fix mypy and tests after merge. --------- Co-authored-by: elonp <[email protected]> Co-authored-by: jzazo <[email protected]>
- Loading branch information