Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify optional and nonempty type constraints #290

Merged
merged 3 commits into from
Jun 21, 2019
Merged

Conversation

mlin
Copy link
Member

@mlin mlin commented Feb 19, 2019

Following discussion on issue #271, this PR is a straw-man resolution to current ambiguity over how/when the optional and nonempty type quantifiers are "enforced." It will definitely need debate and discussion, so fire away!

A couple of comments to start:

  • The select_first idiom is workable but not exactly loved; there's some discussion on Ease optional access with scatter-like block #187 about nicer sugar which could follow on.
  • On the other hand I don't think there's an idiomatic way to coerce Array[T] to Array[T]+ right now, which is needed if we want to enforce nonempty in the same way we propose to statically validate optionals. (And it would be a little weird to enforce one of them statically and the other only at runtime, right?)

Sometimes, optional parameters need a string prefix. Consider this task:
These constraints also apply to function arguments, with exceptions for string concatenation, described below, and the equality/inequality operators `==` and `!=`, which can compare two values of types differing only in their quantifiers, considering `None` equal to itself but no other value.

Quantifiers may apply within compound types; for example, if `a : Array[T?]+`, then `a[0] : T?`. An array type can be both non-empty and optional, `Array[T]+?`, such that the value can be `None` or a non-empty array, but not the empty array. (Reliance on this subtle distinction is discouraged, but it arises from other constructs such as a conditional call with an `Array[T]+` output.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's valid to discourage using Array[T]+?, consider you might have an optional input which may occur multiple times (eg. giving multiple bed/interval files to a variantcaller). It's optional, thus the ?, and giving an empty array might result in an invalid command.

eg:
There are two valid ways of calling a script: someScript or somScript -i 1.txt -i 2.txt ...
The command block might looks like this: someScript ~{true="-i " false="" defined(inputs)}~{sep="-i " inputs}

  • Using Array[File] or Array[File]+ would not allow for the first option.
  • Using Array[File]? would allow for an empty array which would prduce the invalid command someScript -i
  • So your only option in to use Array[File]+?.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think its fair to say the parentheses statement is more of my editorial opinion and can be stricken.


Interpolations with `~{}` and `${}` accept optional string expressions and substitute the empty string for `None` at runtime.

String concatenation with the `+` operator has special typing properties. When applied to two non-optional operands, the result is a non-optional `String`. However, if either operand has an optional type, then the concatenation has type `String?`, and the runtime result is `None` if either operand is `None`. These unusual semantics facilitate command-line flag interpolation. Consider this task:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not obvious to me, thank you for pointing it out. For ease of reference, this is the task:

task test {
  input {
    String? val
  }
  command {
    python script.py ${"--val=" + val}
  }
}

I would much rather not overload the + operator to handle optional strings in a special way. Perhaps the die is already cast, because ${"--val=" + val} is a much loved, common idiom. If at all possible, I would rather use a different symbol than +, to avoid overloading.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically new, FWIW - right now it's only permissible as a special case in string interpolation, and an error elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the special case although I speculate it's actually a bit trickier to implement, propagating the flag of whether we're inside an interpolation or not when typechecking expressions...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also seems to open the Pandora's box for operators (although that might not be a bad thing), IE should we allow the same logic to be applied to other Types and operators?

Int? one
Int? two

Int? subtract = one - two
Int? add = one + two
Int? multiply = one * two
Int? all = ((one * two) + one) - two

To me, it seems like it could be a bad idea to overload the + operator and not do the same overloading on all other operators. I think it will likely cause confusion, and I am sure people will inevitably ask for it if in other contexts if we introduce it to string concatenation.

@@ -2327,9 +2327,17 @@ Then the command would be instantiated as:
/bin/mycmd /path/to/c.txt
```

## Prepending a String to an Optional Parameter
In general, it's invalid to supply an expression of optional type `T?` for a non-optional input or variable of type `T`, as the latter cannot accept `None`. (The idiom `select_first([expr, default])` coerces `expr : T?` to `T` by substituting `default : T` when `expr` is undefined; if `default` is omitted, then the coercion fails at runtime.) Similarly, an `Array[T]` expression does not satisfy an `Array[T]+` input.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A way of coercing Array[T] to Array[T]+ will need to be added.

Copy link
Member Author

@mlin mlin Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it will be problematic to statically enforce + without also adding such a coercion. We can also contemplate leaving + as more of a runtime assertion, with the downside of nonuniformly treating ? and +. Discussion on this welcome. cc @orodeh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mlin, thanks for straightening out these issues.

My personal opinion is that having + as a non-empty array qualifier is fine, but I prefer that it's implementation be done as a runtime assertion. If it is part of the type system, then we run into a combinatorial explosion of types. For example:

Array[String], Array[String]+
Array[Array[String]], Array[Array[String]]+, Array[Array[String]+], Array[Array[String]+]+

However, I would argue that most people find two types sufficient here: Array[String] and Array[Array[String]].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind (/actively like) the fact that you can nest this as a type constraint inside other types. It makes the logic much easier to implement, at least...

I agree though that it should be a coercion (and thus enforced via a runtime check) to go from Array[X] to Array[X]+. Anything else (eg engine functions) is going to be a nightmare to apply correctly in the more complex cases here

Copy link
Member Author

@mlin mlin Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cjllanwarne just clarifying -- do you want to see + enforced as a runtime assertion rather than statically? (I'm on the fence at this point)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+ to me has always seemed like a strange type. To me its a more of a runtime decorator that is an assertion check as opposed to a Type check. I for one do not even mind if it is removed from the specification entirely, however that likely would cause people to raise their pitchforks in protest.

Realistically, the only guarantee that Array[X]+ provides is that Array[0] will not be None. This is not a strong enough argument to assert that it is a separate type completely IMO, and it SHOULD be treated as a normal Array[] type. Ie the following should be valid with no need to coercion:

Array[X]+ myArray
Array[X] otherArray = myArray

The one thing we need to be cognisant of however, is that be relegating the + type to a decorator and not a type, we essentially are introducing a new concept to WDL. That of Decorator expressions, which serve as runtime hints that should be enforced by the engine, but should not change the typeing of the variable they are on

Sometimes, optional parameters need a string prefix. Consider this task:
These constraints also apply to function arguments, with exceptions for string concatenation, described below, and the equality/inequality operators `==` and `!=`, which can compare two values of types differing only in their quantifiers, considering `None` equal to itself but no other value.

Quantifiers may apply within compound types; for example, if `a : Array[T?]+`, then `a[0] : T?`. An array type can be both non-empty and optional, `Array[T]+?`, such that the value can be `None` or a non-empty array, but not the empty array. (Reliance on this subtle distinction is discouraged, but it arises from other constructs such as a conditional call with an `Array[T]+` output.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the only valid form Array[T]+? or can it also be written Array[T]?+? The spec isn't explicit about this as far as I can see.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that technically you can only apply + to arrays. Thus, since Array[T]? is not an array (it's an optional), the form Array[T]?+ would be invalid

@mlin
Copy link
Member Author

mlin commented Feb 25, 2019

@cjllanwarne @orodeh @DavyCats @pshapiro4broad to try the feedback on for size, I changed nonempty-array to be a runtime-only check, constrained the special string concatenation behavior to apply inside interpolations only, (and struck my editorial comment about Array[T]+?). I am personally on the fence about these changes (could change them back), the nice thing will be just to get to something unambiguous. See what you think!

@orodeh
Copy link
Contributor

orodeh commented Feb 26, 2019

Looks good to me. I am still not a fan of allowing + inside interpolation to take a combination of String? and String types.

@pshapiro4broad
Copy link
Contributor

@mlin This looks good to me. As you've done here, it would be nice to make the spec explicit everywhere about run-time vs compile-time behaviors; you shouldn't have to define terminology here for that. That would something for another PR though.

I am still not a fan of allowing + inside interpolation to take a combination of String? and String types.

I agree that it's messy but am not sure what a better solution would be. Maybe a different syntax for string interpolation that makes it clear that a list of fields must all not None for the interpolation to be evaluated.

@geoffjentry
Copy link
Member

@mlin where does this stand?

@mlin
Copy link
Member Author

mlin commented Apr 16, 2019

@geoffjentry I think it's ready for a 'last call' for comments before voting -- I'm happy to address anything that comes in

@geoffjentry geoffjentry added the voting active Changes for which voting is active. label Apr 19, 2019
@pshapiro4broad
Copy link
Contributor

+1

2 similar comments
@orodeh
Copy link
Contributor

orodeh commented Apr 19, 2019

👍

@multimeric
Copy link

👍

@geoffjentry
Copy link
Member

The ayes have it and this is moving to waiting for implementation. I'll lean on @cjllanwarne and @mlin to speak up when that point has been hit.

@geoffjentry geoffjentry added awaiting implementation Changes that are awaiting implementation. and removed voting active Changes for which voting is active. labels Apr 30, 2019
@geoffjentry geoffjentry removed the awaiting implementation Changes that are awaiting implementation. label Jun 21, 2019
@geoffjentry
Copy link
Member

Cromwell conforms to this behavior, making it implemented. Merging

@geoffjentry geoffjentry merged commit 714ec97 into master Jun 21, 2019
@geoffjentry geoffjentry deleted the mlin-quant-check branch June 21, 2019 21:13
@patmagee patmagee added this to the v2.0 milestone Nov 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants