Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if calling force_encoding is necessary before doing so in Addressable::URI, replace with encode! where applicable #340

Closed
wants to merge 3 commits into from

Conversation

ACBullen
Copy link
Contributor

The first commit is a quick fix to this issue, just adding in an extra check to avoid mutating strings when it isn't necessary. The second sets up handling for the off chance that the normalization process doesn't succeed at removing all of the utf-8 invalid characters. force_encoding just changes what the string reports its encoding is, using str.encode!(Encoding::UTF_8, invalid: :replace, replace: "") will change the coding and replace any invalid bytes with an empty string. It's possibly not strictly necessary, but it does seem to be the intent of forcing the encoding throughout this file.

ACBullen and others added 3 commits April 17, 2019 17:03
calling force_encoding on them

This prevents unnecessary mutation of these variables when already set
and appropriately encoded. Necessary for our use case as we freeze our
Addressable::URI objects to ensure they don't get changed when passed around
Calling force_encoding doesn't actually change anything about the string,
it just changes what the string considers its encoding to be, allowing you
to end up with strings with invalid encodings. There are some invocations
in Addressable::URI where this seems intentional (followed by some g_sub'ing)
but at least at the output stage it seems like the strings should be converted
and the non-utf-8 bytes/characters removed
if @normalized_fragment
@normalized_fragment.force_encoding(Encoding::UTF_8)
if @normalized_fragment && @normalized_fragment.encoding != Encoding::UTF_8
@normalized_fragment.encode!(Encoding::UTF_8, invalid: :replace, replace: "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics/LineLength: Line is too long. [85/80]

@@ -1796,8 +1808,8 @@ def normalized_fragment
component == "" ? nil : component
end
# All normalized values should be UTF-8
if @normalized_fragment
@normalized_fragment.force_encoding(Encoding::UTF_8)
if @normalized_fragment && @normalized_fragment.encoding != Encoding::UTF_8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics/LineLength: Line is too long. [81/80]

@@ -1602,7 +1612,9 @@ def normalized_query(*flags)
component == "" ? nil : component
end
# All normalized values should be UTF-8
@normalized_query.force_encoding(Encoding::UTF_8) if @normalized_query
if @normalized_query && @normalized_query.encoding != Encoding::UTF_8
@normalized_query.encode!(Encoding::UTF_8, invalid: :replace, replace: "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics/LineLength: Line is too long. [82/80]

@@ -1531,7 +1539,9 @@ def normalized_path
result
end
# All normalized values should be UTF-8
@normalized_path.force_encoding(Encoding::UTF_8) if @normalized_path
if @normalized_path && @normalized_path.encoding != Encoding::UTF_8
@normalized_path.encode!(Encoding::UTF_8, invalid: :replace, replace: "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Layout/IndentationWidth: Use 2 (not 3) spaces for indentation.
Metrics/LineLength: Line is too long. [82/80]

@@ -1468,7 +1474,9 @@ def normalized_site
site_string
end
# All normalized values should be UTF-8
@normalized_site.force_encoding(Encoding::UTF_8) if @normalized_site
if @normalized_site && @normalized_site.encoding != Encoding::UTF_8
@normalized_site.encode!(Encoding::UTF_8, invalid: :replace, replace: "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics/LineLength: Line is too long. [81/80]

@@ -1047,8 +1051,8 @@ def normalized_userinfo
end
end
# All normalized values should be UTF-8
if @normalized_userinfo
@normalized_userinfo.force_encoding(Encoding::UTF_8)
if @normalized_userinfo && @normalized_userinfo.encoding != Encoding::UTF_8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics/LineLength: Line is too long. [81/80]

if @normalized_password
@normalized_password.force_encoding(Encoding::UTF_8)
if @normalized_password && @normalized_password.encoding != Encoding::UTF_8
@normalized_password.encode!(Encoding::UTF_8, invalid: :replace, replace: "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics/LineLength: Line is too long. [85/80]

@@ -977,8 +981,8 @@ def normalized_password
end
end
# All normalized values should be UTF-8
if @normalized_password
@normalized_password.force_encoding(Encoding::UTF_8)
if @normalized_password && @normalized_password.encoding != Encoding::UTF_8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics/LineLength: Line is too long. [81/80]

@@ -920,7 +922,9 @@ def normalized_user
end
end
# All normalized values should be UTF-8
@normalized_user.force_encoding(Encoding::UTF_8) if @normalized_user
if @normalized_user && @normalized_user.encoding != Encoding::UTF_8
@normalized_user.encode!(Encoding::UTF_8, invalid: :replace, replace: "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics/LineLength: Line is too long. [81/80]

@@ -865,7 +865,9 @@ def normalized_scheme
end
end
# All normalized values should be UTF-8
@normalized_scheme.force_encoding(Encoding::UTF_8) if @normalized_scheme
if @normalized_scheme && @normalized_scheme.encoding != Encoding::UTF_8
@normalized_scheme.encode!(Encoding::UTF_8, invalid: :replace, replace: "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics/LineLength: Line is too long. [83/80]

@dentarg
Copy link
Collaborator

dentarg commented Apr 18, 2019

@ACBullen do you mind removing the "pullbot" commits from the history here? I.e. just keeping bfc9a44 and 686dbbb. I think that's make sense if this is to be merged.

@ACBullen
Copy link
Contributor Author

@dentarg Removed those commits from history

@ACBullen
Copy link
Contributor Author

It looks like the second commit is causing compatibility issues with previous ruby versions so I am going to reopen this PR with just that first commit in a separate branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants