Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge from ado #19

Merged
merged 27 commits into from
Jan 10, 2024
Merged

merge from ado #19

merged 27 commits into from
Jan 10, 2024

Conversation

elonp
Copy link
Contributor

@elonp elonp commented Jan 9, 2024

Merged devops main into github main.
Lots of conflicts, all resolved by accepting the github main version, so no actual file changes in the PR, just merging the commit branches.

elonp and others added 27 commits October 9, 2023 17:11
* `OnemlProcessorsPipelineOperationsServices.COLLECTION_TO_DICT` returns a pipeline with an in_collection and an output that exposes the collection as a dictionary.
* `OnemlProcessorsPipelineOperationsServices.DICT_TO_COLLECTION` returns a pipeline with a dictionary input and an out_collection the entries of the input dictionary as collection entries.
* `OnemlProcessorsPipelineOperationsServices.DUPLICATE_PIPELINE` takes a pipeline and returns a pipeline with multiple copies of it.
* `OnemlProcessorsPipelineOperationsServices.EXPOSE_GIVEN_OUTPUTS` takes data (dict of outputs, dict of dict of out collections) and creates a pipeline that exposes that data as output.
* `OnemlProcessorsPipelineOperationsServices.EXPOSE_PIPELINE_AS_OUTPUT` takes a pipeline, returns an identical pipeline except it has an additional output exposing the given pipeline.
* `OnemlProcessorsPipelineOperationsServices.LOAD_INPUTS_SAVE_OUTPUTS` takes a pipeline with an in_collection called `inputs_to_load` and an out_collection called `outputs_to_save` and returns a pipeline that loads the requested inputs from uris, passes them to the original pipeline, then saves the requested outputs to uris.
* `OnemlHabitatsPipelineOperationsServices.PUBLISH_OUTPUTS_AS_DATASET` takes a pipeline that loads and saves to/from uris and returns a pipeline that loads from datasets and publishes a dataset.
…atsCliDiContainer b/c cl...

allow from oneml.habitats.immunocli import OnemlHabitatsCliDiContainer b/c cli di containers should be exposed
…as input_uris and output...

test load_inputs_save_outputs when pipeline already has input_uris and output_uris
…atasets

fix services in oneml.habitats.pipeline_operations._datasets
…/c it lower-cases datase...

remove furl from oneml.habitats.pipeline_operations b/c it lower-cases dataset name in ampds://
* Refactors plugins into every component.
* Adds hydra registry for pipeline providers.
* Adds two_diamond tests building a pipeline in YAML using a python pipeline provider.
…RITE_USING_LOCAL_CACHE_F...

expose BLOB_READ_USING_LOCAL_CACHE_FACTORY and BLOB_WRITE_USING_LOCAL_CACHE_FACTORY for use by other packages
* Adds `DuplicatePipelineConf` to be able to generate duplicate pipelines in YAML.
* Adds three diamond as test case.
📝 (docs) Update documentation.
…vice

DatasetBlobStoreBaseLocationService provides the blob locations for datasets published by oneml.
It should read these from an installation level configuration, but at the moment has them hard coded.
The container name used from production and non-production datasets outside notebooks is wrong.  This PR hopefully fixes it.
fix get_relative_path when base_uri ends with /
add test that verifies mypy screams when a service id is associated with a service that is a super-type of the declared service type
…pped and add tests.

🚸 Refactor pipeline input validation for optional inputs and add tests.
✨ Add pipeline drop_inputs and drop_outputs methods
…ut_uris

test publish_outputs_as_dataset when there are no input_uris
Datasets published by oneml pipelines have a manifest.json at their root that maps output names to relative paths within the dataset.

Pipeline builders can read from uris using `OnemlProcessorsIoServices.READ_FROM_URI_PIPELINE_BUILDER`, which by itself uses `ReadFromUriProcessor`, that takes a uri and returns the read object.

This PR adds the following semantic to uris:
If the uri has a fragment (i.e. something that follows #) then it is assumed that (after removing the fragment) the uri points to a json file.  The fragment is assumed to be a dot-seprated hierarchical key into the json.  The value associated with the key should be either a relative path from the directory of the json, or an absolute uri.

Examples:
if `file:///path1/path2/index.json` holds:
```
    links:
        rel: path3/array.npy
        abs: ampds://mydataset/
        another_abs: ampds://mydataset/manifest.json#entry_uris.container1?namespace=mynamespace
```
and `ampds://mydataset/manifest.json` holds:
```
    entry_uris:
        container1: containers/container1
```

Then:
* `file:///path1/path2/index.json#links.rel` becomes `file:///path1/path2/path3/array.npy`
* `file:///path1/path2/index.json#links.abs` becomes `ampds://mydataset/`
* `file:///path1/path2/index.json#links.another_abs` becomes `ampds://mydataset/containers/container1?namespace=mynamespace`

The semantics are implemented in `ReadFromUriProcessor` and therefore applicable only to pipelines built directly or indirectly using `OnemlProcessorsIoServices.READ_FROM_URI_PIPELINE_BUILDER`.
Copy link

Test Results: oneml-pipelines

31 tests  ±0   31 ✅ ±0   0s ⏱️ ±0s
 1 suites ±0    0 💤 ±0 
 1 files   ±0    0 ❌ ±0 

Results for commit 9140646. ± Comparison against base commit 2d33de3.

Copy link

Test Results: oneml-processors

112 tests  ±0   112 ✅ ±0   4s ⏱️ ±0s
  1 suites ±0     0 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 9140646. ± Comparison against base commit 2d33de3.

Copy link

Test Results: oneml-habitats

46 tests  ±0   46 ✅ ±0   4s ⏱️ ±0s
 1 suites ±0    0 💤 ±0 
 1 files   ±0    0 ❌ ±0 

Results for commit 9140646. ± Comparison against base commit 2d33de3.

@jzazo jzazo merged commit b195e2d into main Jan 10, 2024
6 of 8 checks passed
@elonp elonp deleted the elonp/merge_from_ado branch March 1, 2024 09:56
@elonp elonp restored the elonp/merge_from_ado branch March 1, 2024 09:57
@jzazo jzazo deleted the elonp/merge_from_ado branch March 14, 2024 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants