-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate dataset #148
Validate dataset #148
Changes from all commits
86c1494
082795d
f5a772f
e3b82d9
a7db63a
05fa5f1
0c456c5
8f2f174
eb5ff88
329ea0d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -249,7 +249,7 @@ def upload_new_dataset_now(self, dataset: Dataset) -> Dataset: | |
dataset = dataset.replace( | ||
source_folder=self._expect_file_transfer().source_folder_for(dataset) | ||
) | ||
dataset.validate() | ||
self.scicat.validate_dataset_model(dataset.make_upload_model()) | ||
# TODO skip if there are no files | ||
with self._connect_for_file_upload(dataset) as con: | ||
# TODO check if any remote file is out of date. | ||
|
@@ -829,6 +829,30 @@ def create_attachment_for_dataset( | |
model.DownloadAttachment, _strict_validation=False, **uploaded | ||
) | ||
|
||
def validate_dataset_model( | ||
self, dset: Union[model.UploadDerivedDataset, model.UploadRawDataset] | ||
) -> None: | ||
"""Validate a dataset in SciCat. | ||
|
||
Parameters | ||
---------- | ||
dset: | ||
Model of the dataset to validate. | ||
|
||
Raises | ||
------ | ||
ValueError | ||
If the dataset does not pass validation. | ||
""" | ||
response = self._call_endpoint( | ||
cmd="post", | ||
url="datasets/isValid", | ||
data=dset, | ||
operation="validate_dataset_model", | ||
) | ||
if not response["valid"]: | ||
raise ValueError(f"Dataset {dset} did not pass validation in SciCat.") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you change this to a |
||
|
||
def _send_to_scicat( | ||
self, *, cmd: str, url: str, data: Optional[model.BaseModel] = None | ||
) -> requests.Response: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you also add a test of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does SciCat return details about what fields failed validation? It does when you try to upload. So it would be good to show those here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not, we checked but it only returns
True
orFalse
. Yes, extra info on what part of the validation failed would be nice.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that a feature we should request from Scicat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly. To be honest, I'm hoping to get them to implement a transaction feature. Then we might not even need this extra validation step. I'm thinking through how that could work and will open an issue with a lot of details eventually.
Let's leave it as is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a transaction feature in this situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A way to make either all uploads (dataset, datablocks, attachments, files) succeed or all fail so we don't end up with partially uploaded data.
create_new_dataset_now
attempts to work in this way but is limited because of the SciCat API. I'm hoping to get a feature that lets us do this better. I's too complicated to explain here, though.