-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check if calling force_encoding is necessary before doing so in Addressable::URI, replace with encode! where applicable #340
Conversation
calling force_encoding on them This prevents unnecessary mutation of these variables when already set and appropriately encoded. Necessary for our use case as we freeze our Addressable::URI objects to ensure they don't get changed when passed around
Calling force_encoding doesn't actually change anything about the string, it just changes what the string considers its encoding to be, allowing you to end up with strings with invalid encodings. There are some invocations in Addressable::URI where this seems intentional (followed by some g_sub'ing) but at least at the output stage it seems like the strings should be converted and the non-utf-8 bytes/characters removed
Check Before Re-encoding
if @normalized_fragment | ||
@normalized_fragment.force_encoding(Encoding::UTF_8) | ||
if @normalized_fragment && @normalized_fragment.encoding != Encoding::UTF_8 | ||
@normalized_fragment.encode!(Encoding::UTF_8, invalid: :replace, replace: "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics/LineLength: Line is too long. [85/80]
@@ -1796,8 +1808,8 @@ def normalized_fragment | |||
component == "" ? nil : component | |||
end | |||
# All normalized values should be UTF-8 | |||
if @normalized_fragment | |||
@normalized_fragment.force_encoding(Encoding::UTF_8) | |||
if @normalized_fragment && @normalized_fragment.encoding != Encoding::UTF_8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics/LineLength: Line is too long. [81/80]
@@ -1602,7 +1612,9 @@ def normalized_query(*flags) | |||
component == "" ? nil : component | |||
end | |||
# All normalized values should be UTF-8 | |||
@normalized_query.force_encoding(Encoding::UTF_8) if @normalized_query | |||
if @normalized_query && @normalized_query.encoding != Encoding::UTF_8 | |||
@normalized_query.encode!(Encoding::UTF_8, invalid: :replace, replace: "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics/LineLength: Line is too long. [82/80]
@@ -1531,7 +1539,9 @@ def normalized_path | |||
result | |||
end | |||
# All normalized values should be UTF-8 | |||
@normalized_path.force_encoding(Encoding::UTF_8) if @normalized_path | |||
if @normalized_path && @normalized_path.encoding != Encoding::UTF_8 | |||
@normalized_path.encode!(Encoding::UTF_8, invalid: :replace, replace: "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layout/IndentationWidth: Use 2 (not 3) spaces for indentation.
Metrics/LineLength: Line is too long. [82/80]
@@ -1468,7 +1474,9 @@ def normalized_site | |||
site_string | |||
end | |||
# All normalized values should be UTF-8 | |||
@normalized_site.force_encoding(Encoding::UTF_8) if @normalized_site | |||
if @normalized_site && @normalized_site.encoding != Encoding::UTF_8 | |||
@normalized_site.encode!(Encoding::UTF_8, invalid: :replace, replace: "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics/LineLength: Line is too long. [81/80]
@@ -1047,8 +1051,8 @@ def normalized_userinfo | |||
end | |||
end | |||
# All normalized values should be UTF-8 | |||
if @normalized_userinfo | |||
@normalized_userinfo.force_encoding(Encoding::UTF_8) | |||
if @normalized_userinfo && @normalized_userinfo.encoding != Encoding::UTF_8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics/LineLength: Line is too long. [81/80]
if @normalized_password | ||
@normalized_password.force_encoding(Encoding::UTF_8) | ||
if @normalized_password && @normalized_password.encoding != Encoding::UTF_8 | ||
@normalized_password.encode!(Encoding::UTF_8, invalid: :replace, replace: "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics/LineLength: Line is too long. [85/80]
@@ -977,8 +981,8 @@ def normalized_password | |||
end | |||
end | |||
# All normalized values should be UTF-8 | |||
if @normalized_password | |||
@normalized_password.force_encoding(Encoding::UTF_8) | |||
if @normalized_password && @normalized_password.encoding != Encoding::UTF_8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics/LineLength: Line is too long. [81/80]
@@ -920,7 +922,9 @@ def normalized_user | |||
end | |||
end | |||
# All normalized values should be UTF-8 | |||
@normalized_user.force_encoding(Encoding::UTF_8) if @normalized_user | |||
if @normalized_user && @normalized_user.encoding != Encoding::UTF_8 | |||
@normalized_user.encode!(Encoding::UTF_8, invalid: :replace, replace: "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics/LineLength: Line is too long. [81/80]
@@ -865,7 +865,9 @@ def normalized_scheme | |||
end | |||
end | |||
# All normalized values should be UTF-8 | |||
@normalized_scheme.force_encoding(Encoding::UTF_8) if @normalized_scheme | |||
if @normalized_scheme && @normalized_scheme.encoding != Encoding::UTF_8 | |||
@normalized_scheme.encode!(Encoding::UTF_8, invalid: :replace, replace: "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics/LineLength: Line is too long. [83/80]
@dentarg Removed those commits from history |
It looks like the second commit is causing compatibility issues with previous ruby versions so I am going to reopen this PR with just that first commit in a separate branch. |
The first commit is a quick fix to this issue, just adding in an extra check to avoid mutating strings when it isn't necessary. The second sets up handling for the off chance that the normalization process doesn't succeed at removing all of the utf-8 invalid characters.
force_encoding
just changes what the string reports its encoding is, usingstr.encode!(Encoding::UTF_8, invalid: :replace, replace: "")
will change the coding and replace any invalid bytes with an empty string. It's possibly not strictly necessary, but it does seem to be the intent of forcing the encoding throughout this file.