Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WDL Biscayne] Move escape evaluation into Cromwell (and make it work) #4427

Merged
merged 8 commits into from
Dec 13, 2018

Conversation

cjllanwarne
Copy link
Contributor

@cjllanwarne cjllanwarne commented Nov 27, 2018

Problems with the old method:

  • Quotes were not being escaped
    • eg this would fail: String s = "a\"b"
  • Hex codes were not being interpreted by wdl_unescape:
    • eg this would fail: String hex_hello = "\x68\x65\x6C\x6c\x6F"
  • Double length unicode codepoints were not being parsed:
    • eg this would fail: String unicode_hello = "\u0068\U00000065\u006C\U0000006C\u006F"

New method:

  • Parse escape sequences in hermes grammar
  • Allow Cromwell to interpret the escape sequences as it needs to

The main benefit to this was not having to mess with the wdl_unescape method in hermes to get the unrespected escape types to work.

val FourDigitUnicode = "\\\\u([0-9a-fA-F]{4})".r
val EightDigitUnicode = "\\\\U([0-9a-fA-F]{8})".r

def parseEscapeSequence(seq: String): ErrorOr[StringEscapeSequence] = seq match {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We test a wider range of escape sequences in string_escaping.wdl - will these still work?

Of course, we are allowed to change the behavior, with consensus - I doubt they see much use.

Copy link
Contributor Author

@cjllanwarne cjllanwarne Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not touching the WDL 1.0 parser in this change, which means that any WDL 1.0 string literal runs through the hermes-encapsulated wdl_unescape function and becomes a plain old :string.

This code will only run over :escape tokens.

And yes, it's deliberate that the set is smaller in Biscayne - per openwdl/wdl#247

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear I referred to the 1.0 file because it's just a convenient list to point at

@geoffjentry
Copy link
Collaborator

@cjllanwarne Please remind me when this is merged as IMO this, plus your new openwdl PR fully satisfy the implementation requirement for the underlying wdl spec PR

Copy link
Collaborator

@aednichols aednichols left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM

@@ -13,7 +13,6 @@ class WomtoolValidateSpec extends FlatSpec with Matchers {

behavior of "womtool validate"


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you have strong opinions on this line, I'd revert this to keep the file from showing up in the change set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because IMO two-line-breaks are worthy of shrinking (but mainly to avoid having to re-run this PR through Travis) I'll leave this in unless you strongly object?

sealed trait StringEscapeSequence extends StringPiece {
def unescape: String
}
case object NewlineEscape extends StringEscapeSequence { override val unescape: String = System.lineSeparator }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would either change parseEscapeSequence() to also use something like case System.lineSeparator, or change this to be override val unescape = "\\n".

Otherwise on some systems the values may not match. Even if we don't support those systems, I still prefer consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're actually subtly different - parseEscapeSequence is looking at what escape sequence was in the WDL (eg if my script has String s = "\n") - this line is specifying what value to replace the escape sequence with in the resulting scala String value

@cjllanwarne cjllanwarne merged commit be1ee39 into develop Dec 13, 2018
@cjllanwarne cjllanwarne deleted the cjl_biscayne_escapes branch December 13, 2018 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants