-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WDL Biscayne] Move escape evaluation into Cromwell (and make it work) #4427
Conversation
val FourDigitUnicode = "\\\\u([0-9a-fA-F]{4})".r | ||
val EightDigitUnicode = "\\\\U([0-9a-fA-F]{8})".r | ||
|
||
def parseEscapeSequence(seq: String): ErrorOr[StringEscapeSequence] = seq match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We test a wider range of escape sequences in string_escaping.wdl
- will these still work?
Of course, we are allowed to change the behavior, with consensus - I doubt they see much use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not touching the WDL 1.0 parser in this change, which means that any WDL 1.0 string literal runs through the hermes-encapsulated wdl_unescape
function and becomes a plain old :string
.
This code will only run over :escape
tokens.
And yes, it's deliberate that the set is smaller in Biscayne - per openwdl/wdl#247
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear I referred to the 1.0 file because it's just a convenient list to point at
@cjllanwarne Please remind me when this is merged as IMO this, plus your new openwdl PR fully satisfy the implementation requirement for the underlying wdl spec PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 LGTM
@@ -13,7 +13,6 @@ class WomtoolValidateSpec extends FlatSpec with Matchers { | |||
|
|||
behavior of "womtool validate" | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless you have strong opinions on this line, I'd revert this to keep the file from showing up in the change set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because IMO two-line-breaks are worthy of shrinking (but mainly to avoid having to re-run this PR through Travis) I'll leave this in unless you strongly object?
sealed trait StringEscapeSequence extends StringPiece { | ||
def unescape: String | ||
} | ||
case object NewlineEscape extends StringEscapeSequence { override val unescape: String = System.lineSeparator } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would either change parseEscapeSequence()
to also use something like case System.lineSeparator
, or change this to be override val unescape = "\\n"
.
Otherwise on some systems the values may not match. Even if we don't support those systems, I still prefer consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're actually subtly different - parseEscapeSequence
is looking at what escape sequence was in the WDL (eg if my script has String s = "\n"
) - this line is specifying what value to replace the escape sequence with in the resulting scala String
value
Problems with the old method:
String s = "a\"b"
wdl_unescape
:String hex_hello = "\x68\x65\x6C\x6c\x6F"
String unicode_hello = "\u0068\U00000065\u006C\U0000006C\u006F"
New method:
The main benefit to this was not having to mess with the
wdl_unescape
method in hermes to get the unrespected escape types to work.