Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Hail dry-run functionality more accessible #1117

Open
MattWellie opened this issue Jan 24, 2025 · 1 comment
Open

Make Hail dry-run functionality more accessible #1117

MattWellie opened this issue Jan 24, 2025 · 1 comment
Labels
enhancement New feature or request shared The change impacts both the LC and RD pipeline.

Comments

@MattWellie
Copy link
Contributor

MattWellie commented Jan 24, 2025

It's non-trivial to 'dry-run' a pipeline. The main.py --dry-run functionality is very limited, and just prints the stages in the workflow, +/- some first/last/skipping through config

What I really want to test locally is the code in queue_jobs and expected_outputs, as well as the interconnection between stages. This requires me to run the workflow locally, without the --dry_run parameter (so that the Stage code, expected_outputs, queue_jobs actually runs), but with dry run communicated directly to Hail here through config, to prevent launching of the assembled jobs. To do this (AFAIK) I need to run the pipeline normally (without dry-run), and with a substitute config block like this:

dataset_gcp_project = "test-analysis-dataset-1234"
sequencing_type = "genome"
path_scheme = "local"
driver_image = "<stub>"

[hail]
billing_project = "test-billing-project"
delete_scratch_on_exit = true
dry_run = true
backend = "local"

It would be nice if this was a behaviour I could trigger with just a CLI setting.

Another glitch here is that even providing these config settings, some methods write directly to GCP during workflow setup (e.g. here or here - the latter responds to workflow.dry_run in config, which is a third different place we can add dry_run, and affects a different subset of behaviours.)

@MattWellie MattWellie added enhancement New feature or request shared The change impacts both the LC and RD pipeline. labels Jan 24, 2025
@MattWellie
Copy link
Contributor Author

I feel like this is a cpg-flow User Experience request, rather than a Prod Pipes request... Maybe consolidating down to a single consistent dry_run value, and a way to decorate methods with @inactivate_on_dry_run if they run actual writing during workflow setup?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request shared The change impacts both the LC and RD pipeline.
Projects
None yet
Development

No branches or pull requests

1 participant