Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not require for all task inputs to be filled by the calling workflow #347

Closed
wants to merge 7 commits into from

Conversation

rhpvorderman
Copy link
Contributor

@rhpvorderman rhpvorderman commented Dec 11, 2019

This is a follow-up on the discussion in broadinstitute/cromwell#5317.

The problem

Requiring the workflow to fill all task inputs is a bad idea for the following reasons:

  1. Some tasks have a multitude of options. STAR for example has more than a hundred flags.
    Most of these are optional. Still the spec requires them to be filled. That is a huge burden on
    the developer.
  2. If some insignificant optional input for a task changes, this requires pipeline developers to update
    all workflows that depend on this task.
  3. Satisfying all possible task inputs will create a workflow WDL with a huge input section. Also if there are subworkflows involved all these inputs have to be passed trough. Imagine running STAR in a subworkflow. That will mean most of the WDL is just passing inputs. This will drown the logic in the pipeline. The WDL will become totally unreadable, which is opposite to WDL's design goals in my opinion.

The solution

Using properly namespaced inputs. This way the workflow can satisfy all the required inputs for a workflow and propagate them to top level. This will lead to wdl files with relatively small input sections. All optional inputs (like an obscure flag on STAR) can then be input by the user by using namespaced inputs.

Cromwell 47 and earlier versions already implement this. We use this feature heavily in BioWDL (example here).

48 will not support fully qualified inputs anymore because of the current development spec and because the WDL1.0 spec does not mention this. In my opinion that is a huge regression in usability that needs to be rectified in the spec.

EDIT: Please note that this spec change does not break things if you do want to include all task inputs in the top level workflow.

versions/development/SPEC.md Outdated Show resolved Hide resolved
versions/development/SPEC.md Outdated Show resolved Hide resolved
versions/development/SPEC.md Outdated Show resolved Hide resolved
@cjllanwarne
Copy link
Contributor

Looks good to me. I would/will thumb this as-is, if/when given the chance

@mlin
Copy link
Member

mlin commented Dec 17, 2019

I too value not having to declare workflow inputs whose only purpose is to tediously feed straight into some required call input. There's an upside we should acknowledge, though, in making all the workflow's required inputs plainly visible in one code section. If that's lacking and workflow calls tasks from several imported documents, the user may practically need help from a static analysis program to figure out what inputs they actually need to write.

So I didn't mind the compromise of declaring & passing through required inputs while being able to omit optional inputs, although it was a little weird. Open to new ideas...

tagging @orodeh as I recall we discussed this at length in the past.

@cjllanwarne cjllanwarne mentioned this pull request Dec 23, 2019
@rhpvorderman
Copy link
Contributor Author

It seems there is no controversy about this. Shall we get this into voting? Who can initiate a voting process? @geoffjentry ?

@mlin
Copy link
Member

mlin commented Jan 28, 2020

Hi all, there is also an impact on workflow composition -- workflow calling another sub-workflow. If the 'inner' workflow has this kind of implicit required input, currently there is no syntax for the 'outer' workflow to supply them in the call inputs:, which does not admit namespaced keys.

So use of this feature, while often convenient, (i) obfuscates a workflow's required inputs and (ii) interferes with composition of the workflow.

I believe we are just codifying a past behavior of Cromwell than many of us enjoyed over the years, and which miniwdl also currently supports. However, these downsides are significant and perhaps worth a passing mention in the spec where the feature is discussed.

cc @kislyuk @orodeh

@rhpvorderman
Copy link
Contributor Author

rhpvorderman commented Jan 28, 2020

So use of this feature, while often convenient, (i) obfuscates a workflow's required inputs and (ii) interferes with composition of the workflow.

This is a good point. But in practice we do pass along the required inputs to the calling workflow.
However, this is just good practice and nowhere is this codified. I can change the PR to state that all required inputs need to be filled by the calling workflow. This will still address issues 1,2 and 3 that I mentioned in the top post.

However this has the significant downside of being a burden on the engine developers, having to distinguish between required and non-required inputs.

I think that not passing all required inputs to the top workflow is a bad practice. Not only that it will make the workflow a lot harder to use for users. I think this is enough of a deterrent. I do not think that this needs to be enforced by the engines and workflow runners.

I believe we are just codifying a past behavior of Cromwell than many of us enjoyed over the years, and which miniwdl also currently supports.

We have enjoyed it over the years because it is a very useful tool. I would like to keep this tool in the toolbox. The tool can be misused, but I think the advantages of using it outweigh the disadvantages that only occur when someone is using it wrong.

EDIT: I think this behavior is very useful for optional inputs to tasks, but it is indeed very bad for required inputs to tasks. The PR should indeed reflect that.

However, these downsides are significant and perhaps worth a passing mention in the spec where the feature is discussed.

I agree. I will update the PR accordingly.

@rhpvorderman
Copy link
Contributor Author

@mlin I updated the PR. Does this properly address your concerns?

@mlin
Copy link
Member

mlin commented Jan 28, 2020

LGTM, thanks!

A possible long-term future idea from @kislyuk: some notation in the call inputs section saying, 'if an input is not provided here and the workflow has an input of the same name, pass it through'

Copy link
Contributor

@orodeh orodeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea makes sense to me. Some of the wording is a bit confusing, in my opinion.

* It is good practice for workflows to provide the *required* inputs for its tasks. If this is
not done the workflow cannot be called as a subworkflow as the call `input:` section does
not allow namespaced inputs. Also, if a workflow does not fill its tasks' required options then these should be supplied by the user, while not being listed in the workflow `input` section. This can cause confusion and make the workflow harder to use.
Engines may optionally enforce the good behavior of supplying all tasks' required inputs by the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be enforced. It isn't hard for an engine to distinguish optional from required inputs. If a required input is not provided, the task is most likely to fail. The nice property here is that it can be checked statically, before running a large workflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I made this optional on purpose, so I would not burden the engine developers with this. I can imagine it is not as straightforward as it may sound. But I am curious what the engine developers actually think. @mlin, @cjllanwarne ?

[the section on fully qualified names](#fully-qualified-names--namespaced-identifiers).
* *Optional* inputs (or required inputs with defaults) for tasks are not required to be
satisfied.
* It is good practice for workflows to provide the *required* inputs for its tasks. If this is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It certainly makes sense to say that a workflow doesn't have to provide all the inputs, but only the required ones. This particular point is confusing though. The original text is sufficient, in my opinion:

If a workflow will only ever be submitted as a top-level workflow, it may optionally leave its tasks' inputs unsatisfied. This then forces the engine to additionally supply those inputs at run time. In this case, the inputs' names must be qualified in the inputs as workflow_name.task_name.input_name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I think that @mlin was referring to this exact original text you quote when he said:

Hi all, there is also an impact on workflow composition -- workflow calling another sub-workflow. If the 'inner' workflow has this kind of implicit required input, currently there is no syntax for the 'outer' workflow to supply them in the call inputs:, which does not admit namespaced keys.

So use of this feature, while often convenient, (i) obfuscates a workflow's required inputs and (ii) interferes with composition of the workflow.

I agree that the current text is not quite clear. But I think that the better option is to make it required for all required task inputs to be satisfied, as you suggest. The text will become more clear that way. But as stated, I would like to know how the engine developers feel about this.

@orodeh
Copy link
Contributor

orodeh commented Jan 28, 2020

A serious issue that we ran into at DNAnexus, which @mlin was hinting at, is the inheritance of optionals. Assume workflow W calls tasks A,B, and C, but does not specify any of the optional task inputs. These then become workflow level inputs, for example W.A.x, W.B.y, W.C.z. In many real world cases the number of optionals is large. This becomes untenable for a graphical user interface (UI) trying to represent the top level workflow W, because the number of inputs will not fit in a web-page (or ten web pages).

We do not have a general solution for the problem, but dxWDL tries to limit this behavior so that compiled workflows can be presented in our UI.

@rhpvorderman
Copy link
Contributor Author

This becomes untenable for a graphical user interface (UI) trying to represent the top level workflow W, because the number of inputs will not fit in a web-page (or ten web pages).

We do not have a general solution for the problem, but dxWDL tries to limit this behavior so that compiled workflows can be presented in our UI.

Interesting. In BioWDL workflows we use parameter_meta for this. We provide a description and a category for each parameter. Categories are: required, common and advanced. @DavyCats has created WDL-AID which can create html pages were all required inputs are on top, all common inputs are listed below that (including nested optional inputs for tasks) and all advanced inputs are in a hide/show section. See an example here. It is not perfect but it solves the problem for us.

@DavyCats
Copy link
Contributor

@orodeh Correct me if I'm wrong here, but this is an issue regardless of whether they are passed through in a namespaced manner (as per this PR) or they are defined in the top-level inputs sections and then passed from call to call explicitly, right? Either way, all of these inputs will be available and would thus get rendered in a UI.

@orodeh
Copy link
Contributor

orodeh commented Jan 29, 2020

@DavyCats That's right. All of these inputs will need to be rendered. The point is that the user is not expecting there to be so many inputs. If the tasks have a small number of required inputs, but many optional inputs, then, the user is typically surprised to see all of the optionals become top-level inputs.

@rhpvorderman
Copy link
Contributor Author

@orodoh, agreed, and this is why the PR states that optional inputs don't have to be filled in by the top level workflow. That solves this issue, correct? I am not so sure I understand what the problem is here. Is there something that needs to be addressed in the PR for this particular issue?

@orodeh
Copy link
Contributor

orodeh commented Jan 30, 2020

The issue is that the input section is very large, even if you don't have to specify everything in it. As an engine developer, this is a problem, for example, because the UI has to render the entire input section. I suggest that we explicitly limit this behavior to a workflow calling tasks; it would not work for a workflow calling sub-workflows.

For example, let's say that:

  • workflow W calls task A and sub-workflow W2
  • Task A has optional input x
  • Sub-workflow WS has optional input x

Then, A.x is an implicit input for W, but WS.x is not. This would limit the namespace keys to one level.

@cjllanwarne
Copy link
Contributor

cjllanwarne commented Jan 30, 2020

I slightly disagree with the direction this PR is heading in. I'm much rather have flexibility and simplicity than a rigid set of complex rules.

If it were up to me, this PR would just change those two existing bullets to be:


  • A task usually has its inputs supplied when called by a workflow.
    • Example: call my_task { input: my_task_input=... }
  • A workflow is allowed not to specify all inputs in this block. In this case, the inputs bubble up to become an input to the workflow instead
    • Example: an unsupplied input might have the fully-qualified name my_workflow.my_task.my_task_input.
  • If that workflow is used as a subworkflow, the input is allowed to bubble up again with a further-qualified name.
    • Example: my_outer_workflow.my_workflow.my_task.my_task_input.
  • There is currently no way to supply a bubbled-up input in an outer workflow's call block.
    • Example: one cannot say call my_workflow as subworkflow { inputs: my_task.my_task_input=... }
  • All required inputs which bubble up like this must ultimately be provided in the inputs set before the top-level workflow can be run.
  • An engine may choose whether to expose optional bubbled-up inputs to end users, and if so it may choose how many layers of nesting it chooses to expose.

Notes:

  • I believe the final bullet should allow UIs to cut down the amount of screen space given over to optional inputs from deeply nested subworkflows.
  • I also think "best practices" are a better place to set rules and expectations than the SPEC itself. I've been bitten in the past assuming "nobody would ever want to do this" and then seeing that actually there are edge cases where it would have been really useful.

@orodeh
Copy link
Contributor

orodeh commented Jan 30, 2020

@cjllanwarne, this sounds good to me. If an engine can limit the bubble up behavior, then I am just fine with it.

@rhpvorderman
Copy link
Contributor Author

@cjllanwarne I agree, with all points except this one:

An engine may choose whether to expose optional bubbled-up inputs to end users, and if so it may choose how many layers of nesting it chooses to expose.

But if it is optional for the engines, that basically means workflow authors can not rely on optional inputs to bubble up. If it is not reliable, it can not be used.

Which means every time a user (for example) wants to use one of the 100+ flags of STAR that is optional, this flag will have to be propagated to the top workflow.

The whole point of this PR (from my point of view), is to make sure the situation where all optional task inputs need to be propagated to the top-level workflow is avoided at all cost. Because that will lead to humongous input sections.

Maybe to solve @orodeh 's problem it can better be reworded like this:

An engine must allow all optional bubbled-up inputs for end users to set. The engine has the option to hide these inputs from the user to prevent clutter and if so it may choose how many layers of nesting it chooses to hide.

====

I also think "best practices" are a better place to set rules and expectations than the SPEC itself. I've been bitten in the past assuming "nobody would ever want to do this" and then seeing that actually there are edge cases where it would have been really useful.

Agreed, I will remove that part.

Also everything with the word "optional" should be avoided in the SPEC, i think. As it is aimed at setting standards. (Yes, I am aware that I am guilty of this myself.)

What do you think @cjllanwarne , @orodeh ?

@orodeh
Copy link
Contributor

orodeh commented Jan 31, 2020

I can see why this is a valuable property, and we have users who really wanted it. I did try to support the full bubble up behavior, but found that it was hard an probably impractical. You can see the discussion here (https://github.com/dnanexus/dxWDL/blob/master/doc/MissingCallArguments.md).

It totally makes sense for you to run your workflow on an engine that supports this property, such as Cromwell and MiniWDL. However, think this should be optional for an engine implementation.

@cjllanwarne
Copy link
Contributor

@orodeh in that ticket it sounds like all unsupplied inputs are tricky for you (even those currently allowed in the SPEC)? Is there anything particularly more tricky about bubbled-up inputs?

@rhpvorderman on reflection, I agree with you that we should preferably say that engines have flexibility on these inputs being allowed to be "hidden by the UI" rather than "ignorable by the engine".

Since workflows are often passed around as "workflow + sample inputs", having some engines respect the bubbled-up inputs and some not would seem to me to be a bad idea. However, having some UIs hide the optional bubbled up inputs from users as "very noisy" when generating new inputs sets seems fine.

@rhpvorderman
Copy link
Contributor Author

@cjllanwarne Since your list of bullet points was much more clear and much better worded I copied it mostly at verbatim. I made a change in the last bullet points to reflect the discussion.

Engines must allow all bubbled-inputs for optional task to be filled by the end user. However engines may choose how many layers are made visible to the end user.

Does this solve your problem @orodeh ?

@orodeh
Copy link
Contributor

orodeh commented Feb 6, 2020

Here is a deeper explanation for how dxWDL works, and what the root problem is. This is a simplified description of the dnanexus system, but should convey the essence of it.

Introduction

Natively, dnanexus supports applets and dx:workflows. An applet is equivalent to a WDL task with no expressions and a simple type system (Int, String, Array[Int], Array[String] and a few others). A dx:workflow is equivalent to a linear WDL workflow with no expressions, conditionals, or scatters.

An example of a dx:workflow, written in WDL:

workflow W {
    input {
         File reference
         Int  threshold
    }

    call A { input: g=reference, t = threshold }
    call B  { input : bam = A.bam }
    call C { input : files = B.files }

    output {
       File assembly = C.assembled
   }
}

The simplicity of the the dx:workflows makes the infrastructure scalable (job scheduling, control, etc.). The challenge is the big gap between what WDL supports and what dnanexus provides. For example, any WDL expression requires running a cloud job to evaluate. By comparison, Cromwell can evaluate WDL expressions easily, at the cost of a scaling challenge for the Cromwell server(s).

The task of the dxWDL compiler is to statically compile a WDL program into applets and dx-workflows. It attempts to:

  1. Generate readable dx:workflows and applets, so they would present well on the UI and allow debugging and tracing
  2. Minimize the number of generated objects
  3. Minimize the number of extra jobs required for execution and evaluation

Argument passing

The crux of the problem involves what happens when a workflow calls a sub-workflow. In dnanexus, this translates into a dx:workflow calling a dx:subworkflow. A dx:subworkflow is the same as a top level workflow, with the exception that all of its arguments must be declared up front, and must be provided at runtime (unless optional).

Let's say that dx:workflow A calls dx:subworkflow B that calls dx:subworkflow C.

workflow A {
    call B
}

workflow B {
    call C
}

workflow C {
    input {
        Int x
    }
}

C takes a parameter x. In order to make input C.x visible to A, so that they user can set it, we must modify the input sections into something like this:

workflow A {
   input {
       Int  C___x
   }
    call B { input: C___x : C___x }
}

workflow B {
   input {
       Int  C___x
   }
    call C { input : x = C___x }
}

workflow C {
    input {
        Int x
    }
}

We need to prefix x with C because workflows A and B may also have an input called x. We can't use the dot (.) character because that is reserved in dnanexus. We reach a fairly unsightly name mangling scheme.

This must be done for any parameter we want to pass to C, be it optional or required. This can be done, and I have tried it. The problems I ran into are:

  1. The generated workflows are unreadable
  2. Normal programming invariants don't hold. The user thinks that workflow A has no inputs, however, it does. In fact, the user needs to dig into workflow C to figure this out, reading B is insufficient.
  3. The UI becomes unusable

Therefore, I settled on supporting only one level of argument passing, in the special case of a top-level workflow calling tasks.

I hope this gives people a better picture.

@rhpvorderman
Copy link
Contributor Author

@orodeh, I see were you are coming from. This will be hard to implement in dxWDL and it will lead to numerous difficulties if it is implemented in dxWDL.

What do you think of the following statement:

  1. dxWDL is not a WDL engine. It is a transpiler. It transpiles WDL into dxWorkflows. It therefore has to take into account features of the dxWorkflow system. dxWDL may therefore choose to only implement a subset of the spec.

Also what do you think of the statement I put in the inital post?

EDIT: Please note that this spec change does not break things if you do want to include all task inputs in the top level workflow.

Given these statements, is there still a problem for dxWDL?

@orodeh
Copy link
Contributor

orodeh commented Feb 10, 2020

@rhpvorderman dxWDL may be a compiler, but the reason for its existence is to be able to run on the dnanexus engine. It should be able to run and accept any WDL program. Therefore, I prefer the wording by @cjllanwarne with the allowance for an engine (dnanexus or dxWDL or both) to limit the bubble up behavior.

My deeper concern is that the idea of unlimited bubble-up behavior runs counter to normal programming practice and thinking. Yes, it will work nicely for your use case. However, WDL programs lose the important invariant that you can visually see all the inputs to a call to sub-workflow W by reading the input section for W.

@rhpvorderman
Copy link
Contributor Author

My deeper concern is that the idea of unlimited bubble-up behavior runs counter to normal programming practice and thinking.

WDL is not a programming language. It solves a different problem and therefore requires a different approach. It calls command line tools. These tools have optional flags.

I don't mind declaring all the inputs that I think are important. But researchers might have some special requirements for flags of certain tools. Instead of having to take into account every special use case, this bubble up behavior solves the problem while keeping workflows readable.

Yes, it will work nicely for your use case. However, WDL programs lose the important invariant that you can visually see all the inputs to a call to sub-workflow W by reading the input section for W.

You can argue also the other way round, that requiring all inputs to be filled, leads to a humongous number of inputs that obscure the inputs that are really important.

Therefore, I prefer the wording by @cjllanwarne with the allowance for an engine (dnanexus or dxWDL or both) to limit the bubble up behavior.

I can understand that, and I am not opposed to that. However my deeper concern is that making it optional does accomplish nothing. It will make inputs jsons and workflows essentially non-portable across engines.

Anyway. I don't mind being pragmatic about this. We both have entrenched opinions on this, and the likelihood of agreement seems to be near-zero. The primary motivation for this PR was that Cromwell breaks certain behavior that was allowed in the past. This was due to a word change in the spec. This PR attempts to correct that.

If @cjllanwarne doesn't mind supporting this behavior in Cromwell, I can use the original wording and make this optional.

Under protest, because I think writing: "Hey engine developer, do whatever you want!" has no place in a spec.

@DavyCats
Copy link
Contributor

My deeper concern is that the idea of unlimited bubble-up behaviour runs counter to normal programming practice and thinking.

I'm not entirely in agreement here. Various programming languages allow for this type of behaviour. For example, in R one can define a function as having the ... argument. This allows you to call that function with keyword arguments which aren't explicitly defined in the function but belong to some internally called one.
Similar behaviour can also be achieved in python using the *args and **kwargs syntax.

Under protest, because I think writing: "Hey engine developer, do whatever you want!" has no place in a spec.

The specification wouldn't be very specific anymore if it allowed implementations to do whatever they want. It might be okay in some cases to allow for some optional features but we have to careful here, I think.

@orodeh
Copy link
Contributor

orodeh commented Feb 18, 2020

Abstracting away from our DNAnexus experience, I think there is one more point worth mentioning here. In case others will learn from it.

We have customers with FDA approved pipelines that required locked down workflows. Such a workflow has an input section that cannot be circumvented. That is, you cannot feed arguments via other routes.

This is a case where we strictly wanted to disallow the bubble up behavior.

@DavyCats
Copy link
Contributor

Perhaps in that case adding syntax to explicitly enable the bubble up behaviour might be a good idea. That way people will also be able to see it when looking at the WDL file.

@rhpvorderman
Copy link
Contributor Author

Very good idea @DavyCats . Now we need some intuitive syntax to do so.

Proposal 1: comment-based. Inspired by python.

workflow example {
     inputs {  # locked: false
     }
}
# Or the comment can be at the workflow level
workflow example {   # locked: false
     inputs {
     }
}

Proposal 2: new lock keyword

workflow example {
     lock true
     inputs {
     }
}

Proposal 3:
No extra language keywords. The execution engines decide whether to run in locked or unlocked mode based on CLI flags, settings files etc.

I think proposal 3 is by far the best option. This means I have to rewrite this PR a bit. I propose that I write down the behavior for locked and unlocked mode. And then make it optional for engines which mode(s) to support. I know that for cromwell, it is only one line of code. So it should be easy to support both modes. I do not think this will be hard for engine developers to implement. What do you think about this @cjllanwarne , @mlin , @orodeh ? @DavyCats, @patmagee what do you think of this as WDL users?

@rhpvorderman
Copy link
Contributor Author

rhpvorderman commented Feb 19, 2020

Proposal for wording this in the spec (iteration 1):

  • There are two modes for satisfying workflow inputs. locked and unlocked mode. It is up to the engine which mode(s) to support.
  • In locked mode all required subworkflow inputs must be satisfied by the calling workflow. The user can only supply inputs that are defined in the input section of the calling workflow.
  • In unlocked mode all required inputs which bubble up must be provided
    in the inputs set before the top-level workflow can be run.
  • In unlocked mode an engine must allow optional bubbled-up inputs to be filled in by end
    users. These inputs are not required to be filled.
  • Showing all optional inputs which are deeply nested in the workflow can lead
    to a very large number of inputs. Therefore, it is up to the engine how many
    layers of nesting it chooses to make visible in the UI.

Thinking this over again, I think we can ensure portability between locked and unlocked modes by requiring required subworkflow inputs to be filled by the calling workflow. The proposal becomes then as follows (iteration 2):

  • All required subworkflow and task inputs must be satisfied by the calling workflow.
  • There are two modes for satisfying workflow inputs. locked and unlocked mode. It is up to the engine which mode(s) to support.
  • In locked mode the user can only supply inputs that are defined in the input section of the calling workflow.
  • In unlocked mode an engine must allow optional bubbled-up inputs to be filled in by end
    users. These inputs are not required to be filled.

I feel iteration 2 works really well:

  • portability between workflow engines
  • Gives users flexibility to set optional flags on tasks if they need that flexibility
  • Addresses the need for FDA approved locked down workflows.

@DavyCats
Copy link
Contributor

Are you proposing that this lock syntax will be a required part of the workflow? If not, I think it might be preferable to have to disable the lock rather than enable it. That way there would be an explicit indication in the WDL file that more inputs are available.

I would rather not make comments meaningful on an execution level. So I'd go with the second proposal.

If this were on task level, this seems like something that would be in the runtime section. I'm wondering if it's worth it to allow for a runtime section to be added to workflows as well. Although I'm guessing there wouldn't be many meaningful runtime attributes for a workflow.

@rhpvorderman
Copy link
Contributor Author

Are you proposing that this lock syntax will be a required part of the workflow? If not, I think it might be preferable to have to disable the lock rather than enable it. That way there would be an explicit indication in the WDL file that more inputs are available.

I agree with that. The lock should be explicitly disabled.

However I do think this is done best on the engine side, because the extra syntax looks a bit ugly in my opinion. I think the default mode should be "locked" mode, and "unlocked" mode can be optionally supported.

@cjllanwarne
Copy link
Contributor

cjllanwarne commented Feb 20, 2020

The "use syntax to enable this" reminds me of another option that came up in the past - that an author might call out which calls are bubbling up, and the engine rewrites your WDL to insert the passthrough variables.

Eg you could have something like:

workflow foo {
  inputs {
    passthrough all from my_task
  }
  call my_task
}

task my_task {
  inputs {
    Int x
    Int y
    Int z
  }
}

in which case the workflow inputs would be:

{
  "foo.x": 1
  "foo.y": 1
  "foo.z": 1
}

In case of collision (eg if more than one task had the same unsupplied input name, we could say either that they both get the same value, or it's a parsing error, or they need to disambiguate with an extra prefix (see additional options below).

Within that framework, a couple of other options might also be possible, eg things like:

  • passthrough required from my_task
    • To only pass through required (ie non-optional) inputs
  • passthrough all from my_task prefixed with "my_task_"
    • To rename the inputs to avoid collisions
  • passthrough all from all
    • For the super-lazy option: pass through every unsupplied value from every call

... just one more option to throw into the mix... 😄

@rhpvorderman
Copy link
Contributor Author

Interesting proposal @cjllanwarne !

In case of collision (eg if more than one task had the same unsupplied input name, we could say either that they both get the same value, or it's a parsing error, or they need to disambiguate with an extra prefix (see additional options below).

I think this is potentially more confusing than simply using namespaced inputs.

I am still in favour of not adding extra syntax, but solving this engine side, defaulting to the behavior were bubbling up is not possible.
This means you can get all the inputs from reading the input section by default, and that if you want more control you flip a switch on the engine. To me that seems to be the easiest and most straightforward way of solving this.

@rhpvorderman
Copy link
Contributor Author

Since this discussion got quite long I feel that the original title and introductory post no longer matches with the PR, so I opened a new one #359 .

@rhpvorderman rhpvorderman deleted the reworkinputs branch March 6, 2020 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants