-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an inferface to Python API for a set of files and jobspecs #158
Conversation
…t into python-multiple-files-job
…t into python-multiple-files-job
Thanks for doing this work, @edknv I was attempting to try it out, but:
Does our init logic need an update to make BatchJobSpec importable? |
@randerzander There has an update in the client liibrary, so I think it needs a reinstall.
But it raises another question. Every time we have an update in the client, customers will have the same issue. How do we make sure they are using the latest version of the client library? |
I believe I'm installing the client library from your commit:
Creating the env and installing:
|
Hmm, I'm trying to repro but I can't seem to, even with using all the commands verbatim starting from |
ok, you can ignore my feedback. I must have messed up my git index, it's working for me now :) |
batch = job_indices[batch_start:batch_end] | ||
|
||
# Submit each batch of jobs | ||
batch_results = [self._submit_job(job_id, job_queue_id) for job_id in batch] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just noticing this now, but we probably need to handle exceptions from _submit_job better. Currently, if we submit a batch of exceptions and one fails, the rest of the items in the batch are failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
job_specs = create_job_specs_for_batch(files_batch) | ||
|
||
job_ids = [] | ||
for job_spec in job_specs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're in process of turning the CLI code into a library, I'd like to be a bit more precise here and handle corner cases where more than one of a single task type is requested... or if multiple tasks of the same type are selected with different configuration parameters.
Its probably ok if we just reject duplicate tasks out of hand for now and raise an error, but its also worth thinking through when/if we might want them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I thought about this, but I opted to reject duplicate task immediately and raise an error if there are any duplicate tasks in e1fceb6, mostly to avoid complexity.
Given the serial nature of our current pipeline, I didn't think duplicate tasks made much sense, but maybe there are some use cases where users might want to apply, for example, split tasks to pdf documents but not on pptx documents, or something like that? Or maybe in that case, two separate pipelines makes more sense.
Co-authored-by: Devin Robison <[email protected]>
Co-authored-by: Devin Robison <[email protected]>
Description
Closes #78.
BatchJobSpec
for generating Jobs from a set of files and JobSpecs.In the following example from README,
BatchJobSpec
is interchangeable withJobSpec
in the main API.Checklist