Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiline string implementation #1542

Merged
merged 7 commits into from
Feb 19, 2021
Merged

Multiline string implementation #1542

merged 7 commits into from
Feb 19, 2021

Conversation

anthony-c-martin
Copy link
Member

@anthony-c-martin anthony-c-martin commented Feb 12, 2021

Contributing a Pull Request

If you haven't already, read the full contribution guide. The guide may have changed since the last time you read it, so please double-check. Once you are done and ready to submit your PR, run through the relevant checklist below.

Contributing a feature

  • I have opened a new issue for the proposal, or commented on an existing one, and ensured that the bicep maintainers are good with the design of the feature being implemented
  • I have included "Fixes #{issue_number}" in the PR description, so GitHub can link to the issue and close it when the PR is merged
  • I have appropriate test coverage of my new feature

Fixes #416

@codecov-io
Copy link

codecov-io commented Feb 12, 2021

Codecov Report

Merging #1542 (4fe963d) into main (c8326cb) will decrease coverage by 0.00%.
The diff coverage is 94.59%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1542      +/-   ##
==========================================
- Coverage   94.92%   94.92%   -0.01%     
==========================================
  Files         356      356              
  Lines       19258    19332      +74     
  Branches       14       14              
==========================================
+ Hits        18281    18351      +70     
- Misses        977      981       +4     
Flag Coverage Δ
dotnet 95.40% <94.59%> (-0.01%) ⬇️
typescript 27.20% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/Bicep.LangServer/SemanticTokenVisitor.cs 0.00% <ø> (ø)
src/Bicep.Core/Parsing/Parser.cs 97.90% <83.33%> (-0.16%) ⬇️
src/Bicep.Core/Parsing/Lexer.cs 97.72% <90.00%> (-0.73%) ⬇️
...Core.IntegrationTests/Emit/TemplateEmitterTests.cs 100.00% <100.00%> (ø)
src/Bicep.Core.UnitTests/Parsing/LexerTests.cs 100.00% <100.00%> (ø)
src/Bicep.Core.UnitTests/Parsing/ParserTests.cs 100.00% <100.00%> (ø)
src/Bicep.Core/Diagnostics/DiagnosticBuilder.cs 100.00% <100.00%> (ø)

@anthony-c-martin anthony-c-martin changed the title [WIP] Multiline strings Multiline string implementation Feb 13, 2021
@anthony-c-martin anthony-c-martin marked this pull request as ready for review February 13, 2021 07:12
@anthony-c-martin
Copy link
Member Author

One thing that is awkward:

If I type ' to start a multiline string, VSCode inserts a terminating ' so I have ''. If I think type another ' (to make it 3 quotes), VSCode does the same thing again, so I end up with 4.

To get VSCode to give me 3 quotes I have to enter the following:

  • ', ', ', ', backspace

If I try ', ', ', backspace, I end up with 2 quotes.

I assume this is related to the config in src/vscode-bicep/language-configuration.json, I'm just not sure how to sort it out.

@alex-frankel
Copy link
Collaborator

Noticed this as well - maybe it's easiest to switch to a different character like backtick? Would be a good question for Anders

@majastrz
Copy link
Member

What happens if you just add the new type of string tokens to language-configuration.json? Wondering if VS code has some logic to handle this situation...

@anthony-c-martin
Copy link
Member Author

What happens if you just add the new type of string tokens to language-configuration.json? Wondering if VS code has some logic to handle this situation...

Turns out I should have just tried this out :) I've pushed the changes for this.

Now if I type ''', VSCode inserts an additional ''', with the cursor (|) as such '''|'''. It's still slightly awkward because '''''' is interpreted as an opening sequence for a multiline string with 6 consecutive quotes, so this shows a syntax error until the user inserts a character.

@alex-frankel
Copy link
Collaborator

We might have control over what is inserted on a completion like that. I think you can add a new line and potentially move the second ''' to position zero since that white space matters.

@shenglol
Copy link
Contributor

@anthony-c-martin - this might be a rare case, but is it possible to have a multiline string end with '? l could not figure out a way to do that:
image

@anthony-c-martin
Copy link
Member Author

@anthony-c-martin - this might be a rare case, but is it possible to have a multiline string end with '? l could not figure out a way to do that:
image

No - this would be blocked by this approach since the lexer is looking for exactly 3 ' chars to terminate the string, and will keep scanning if it sees an additional '.

If we were to relax this, you would always need to pick an opening sequence of ' n chars with n > the longest sequence of ' chars in the string you're including - e.g. this would not be possible:

var myString = '''
abc''''''''def
'''

and would instead have to be written as:

var myString = '''''''''
abc''''''''def
'''''''''

I went with this approach to give more freedom to pick the opening sequence, and we hypothesized that the majority of use-cases would not be sensitive to whether or not the string is terminated with a newline.

@shenglol
Copy link
Contributor

@anthony-c-martin - this might be a rare case, but is it possible to have a multiline string end with '? l could not figure out a way to do that:
image

No - this would be blocked by this approach since the lexer is looking for exactly 3 ' chars to terminate the string, and will keep scanning if it sees an additional '.

If we were to relax this, you would always need to pick an opening sequence of ' n chars with n > the longest sequence of ' chars in the string you're including - e.g. this would not be possible:

var myString = '''
abc''''''''def
'''

and would instead have to be written as:

var myString = '''''''''
abc''''''''def
'''''''''

I went with this approach to give more freedom to pick the opening sequence, and we hypothesized that the majority of use-cases would not be sensitive to whether or not the string is terminated with a newline.

Gotcha. This makes sense. Thanks for the explanation.

@majastrz
Copy link
Member

We might have control over what is inserted on a completion like that. I think you can add a new line and potentially move the second ''' to position zero since that white space matters.

It's unfortunately not a completion but a different VS code feature enabled via config. So the snippet solution might now work unless we specifically included that completion everywhere.

@majastrz
Copy link
Member

majastrz commented Feb 18, 2021

Question on the issue @shenglol noticed above. Why can't we have the lexer scan closing quotes and subtract the expected number of closing quotes from the number actually specified? Those that remain could then be included in the string value, no?

@anthony-c-martin
Copy link
Member Author

Question on the issue @shenglol noticed above. Why can't we have the lexer scan closing quotes and subtract the expected number of closing quotes from the number actually specified? Those that remain could then be included in the string value, no?

Take a look at my response. That's definitely a valid option, it just puts more of a constraint on the choice of opening quotes.

stringLeftPiece -> "'" STRINGCHAR* "${"
stringMiddlePiece -> "}" STRINGCHAR* "${"
stringRightPiece -> "}" STRINGCHAR* "'"
stringComplete -> "'" STRINGCHAR* "'"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STRINGCHAR [](start = 22, length = 10)

Not a new issue and definitely non-blocking for this PR, but I just realized we haven't defined what STRINGCHAR is or even IDENTIFIER.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes :) I decided to leave it along with this PR

var startOffset = terminatingQuoteCount;

// we strip a leading \r\n or \n
if (tokenText[startOffset] == '\r')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokenText [](start = 16, length = 9)

This matches \r\n | \n, but I believe in the lexer elsewhere we match [\r|\n]+ elsewhere. It's subtly inconsistent, but not an issue for CRLF or LF files. I think it's fine for now, but I think we need to centralize our newline handling in the future. To finish unicode support post-0.3, we will also need to support line break and paragraph break chars.

@@ -409,6 +450,45 @@ private void ScanNewLine()
}
}

private TokenType ScanMultilineString()
{
const int terminatingQuoteCount = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

terminatingQuoteCount [](start = 22, length = 21)

This constant should be at the class level and shared between the two functions.

public ErrorDiagnostic UnterminatedMultilineString() => new(
TextSpan,
"BCP140",
$"The multi-line string at this location is not terminated. Terminate it with \"'''\".");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The multi-line string at this location is not terminated. [](start = 18, length = 57)

I don't see a test that produces this error. Can you add a unit or integration test to cover it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind - found it.


In reply to: 579478035 [](ancestors = 579478035)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although could you add an integration test so it's more obvious where the error span lands?


In reply to: 579478358 [](ancestors = 579478358,579478035)

Copy link
Member

@majastrz majastrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed off with a couple of comments.

@anthony-c-martin anthony-c-martin enabled auto-merge (squash) February 19, 2021 21:24
@anthony-c-martin anthony-c-martin merged commit c80f38f into main Feb 19, 2021
@anthony-c-martin anthony-c-martin deleted the antmarti/multilines branch February 19, 2021 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiline string variables
5 participants