Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for string values #58

Merged
merged 11 commits into from
Jun 11, 2020
6 changes: 0 additions & 6 deletions ion/integration_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,6 @@ var textRoundTripSkipList = []string{
"lists.ion",
"localSymbolTableImportZeroMaxId.ion",
"nonNulls.ion",
"nonNulls.ion",
"notVersionMarkers.ion",
"structWhitespace.ion",
"subfieldInt.ion",
Expand Down Expand Up @@ -204,7 +203,6 @@ var malformedIonsSkipList = []string{
"localSymbolTableWithMultipleSymbolsAndImportsFields.ion",
"localSymbolTableWithMultipleSymbolsFields.10n",
"localSymbolTableWithMultipleSymbolsFields.ion",
"longStringRawControlCharacter.ion",
"minLongWithLenTooLarge.10n",
"minLongWithLenTooSmall.10n",
"negativeIntZero.10n",
Expand All @@ -213,8 +211,6 @@ var malformedIonsSkipList = []string{
"nopPadWithAnnotations.10n",
"nullDotCommentInt.ion",
"sexpOperatorAnnotation.ion",
"stringLenTooLarge.10n",
"stringRawControlCharacter.ion",
"stringWithLatinEncoding.10n",
"structOrderedEmpty.10n",
"surrogate_1.ion",
Expand All @@ -235,7 +231,6 @@ var malformedIonsSkipList = []string{

var equivsSkipList = []string{
"annotatedIvms.ion",
"clobs.ion",
"localSymbolTableAppend.ion",
"localSymbolTableNullSlots.ion",
"localSymbolTableWithAnnotations.ion",
Expand All @@ -244,7 +239,6 @@ var equivsSkipList = []string{
"nonIVMNoOps.ion",
"sexps.ion",
"stringUtf8.ion",
"strings.ion",
"structsFieldsDiffOrder.ion",
"structsFieldsRepeatedNames.ion",
"systemSymbols.ion",
Expand Down
40 changes: 33 additions & 7 deletions ion/tokenizer.go
Original file line number Diff line number Diff line change
Expand Up @@ -542,11 +542,11 @@ func (t *tokenizer) readString() (string, error) {
if err != nil {
return "", err
}

switch c {
case -1, '\n':
if isProhibitedControlChar(c) || c == '\n' {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should explicitly handle the EOF/-1 case here and below. isProhibitedControlChar will catch it, but the method name makes me think it's only looking for ASCII control characters, which doesn't include -1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, but that means going back to this comment.
We can either change the function name to something more generic like invalidStringCharacter or put the logic before this commit back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a subtle difference here. We've gone from:

if isInvalidChar(c) {
    // Doesn't handle EOF, but could have based on the name
}
switch c {
case -1: // Handles EOF

to

if isProhibitedControlChar(c) {
     // Handles EOF, but shouldn't based on the name.
}

I think isProhibitedControlChar is a more precise/communicative name, so I'd like to keep it. But that means we need the explicit EOF check:

if c == -1 || c == '\n' || isProhibitedControlChar(c) {
    // Handles EOF
}

I think this is especially helpful since someone new to the codebase could reasonably expect EOF to be handled by the

if err != nil {
// ...
}

above, since err is how the standard library reports EOF.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Updated the pull request. Thanks.

return "", t.invalidChar(c)
}

switch c {
case '"':
return ret.String(), nil

Expand Down Expand Up @@ -582,20 +582,24 @@ func (t *tokenizer) readLongString() (string, error) {
if err != nil {
return "", err
}

switch c {
case -1:
if isProhibitedControlChar(c) {
return "", t.invalidChar(c)
}

switch c {
case '\'':
startPosition := t.pos
ok, err := t.skipEndOfLongString(t.skipCommentsHandler)
if err != nil {
return "", err
}
if ok {
return ret.String(), nil
}

if startPosition == t.pos {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like skipEndOfLongString reports whether it consumed the string ending, so I think you can replace this position check with an else {...} attached to the if ok {...}.

wasConsumed, err := t.skipEndOfLongString(t.skipCommentsHandler)
if err != nil {
	return "", err
}
if wasConsumed {
	return ret.String(), nil
} else {
        // The ' was not part of a long string ending.
        ret.writeByte(byte(c))
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll also return false if it skipped over a ''' /* pause in the string */ ''' sequence, in which case you don't want to keep the '. :( This seems correct to me in the short term, skipEndOfLongString should probably should be refactored to return a tri-state in the longer term.

Copy link
Contributor Author

@R-maan R-maan Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I almost refactored skipEndOfLongString but then I decided to KISS.

#61 to change skipEndOfLongString

// No character has been consumed. It is single '.
ret.WriteByte(byte(c))
}
case '\\':
c, err = t.peek()
if err != nil {
Expand Down Expand Up @@ -1263,3 +1267,25 @@ func (t *tokenizer) unread(c int) {
t.pos--
t.buffer = append(t.buffer, c)
}

func isProhibitedControlChar(c int) bool {
// Values lower than this are non-displayable ASCII characters; except for new line and white space characters.
if c > 0x1F {
return false
}
if isStringWhitespace(c) || isNewLineChar(c) {
return false
}
return true
}

func isStringWhitespace(c int) bool {
return c == 0x09 || //horizontal tab
c == 0x0B || //vertical tab
c == 0x0C // form feed
}

func isNewLineChar(c int) bool {
return c == 0x0A || //new line
c == 0x0D //carriage return
}