Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: (plz--coding-system) Extract charset from media type #66

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

josephmturner
Copy link
Contributor

Previously, plz--coding-system always returned nil since coding-system-from-name expects a coding system name string like "UTF-8".

Previously, plz--coding-system always returned nil since
coding-system-from-name expects a coding system name string like
"UTF-8".
Copy link

@basil-conto basil-conto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pinging in support of this patch because plz does not currently decode non-UTF-8 responses, despite claiming to.

To illustrate with a real example, plz--coding-system fails to extract the correct charset from a header like (content-type . "text/javascript; charset=ISO-8859-1").

(coding-system-from-name content-type))))
(when-let* ((headers (or headers (plz--headers)))
(raw-content-type (alist-get 'content-type headers))
(content-type (downcase raw-content-type))
Copy link

@basil-conto basil-conto Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, you could bind case-fold-search around the string-match; might save a string allocation.

Copy link

@basil-conto basil-conto Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet more alternatively (for more brownie points?), did you consider using the built-in mail-parse library? See (info "(emacs-mime) Interface Functions"). For example:

(require 'mail-parse)
(mail-content-type-get
 (mail-header-parse-content-type "text/javascript; charset=ISO-8859-1")
 'charset)
;; => "ISO-8859-1"

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, you could bind case-fold-search around the string-match; might save a string allocation.

Thanks, that's a good idea. @josephmturner Would you like to add that, or should I?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet more alternatively (for more brownie points?), did you consider using the built-in mail-parse library?

No, I wasn't aware of that function. Given the simplicity of this scenario, I think I'd prefer to not have to load that library. Looking at the rfc2231 library it is aliased to, while that library is certainly useful, the function rfc2231-parse-string which gets called looks large, complicated, and would probably be overkill for this. I can't imagine it would help performance to use that function, either. Thanks for pointing it out, though. :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't imagine it would help performance to use that function

Neither does making HTTP requests ;).

@alphapapa
Copy link
Owner

Just pinging in support of this patch because plz does not currently decode non-UTF-8 responses, despite claiming to.

To illustrate with a real example, plz--coding-system fails to extract the correct charset from a header like (content-type . "text/javascript; charset=ISO-8859-1").

@basil-conto Thank you for reminding me. Would you be willing to contribute a test case for this? That would make it easy to apply this as a fix to the stable release branch.

@alphapapa alphapapa self-assigned this Mar 25, 2025
@alphapapa alphapapa added the bug Something isn't working label Mar 25, 2025
@alphapapa alphapapa added this to the v0.9.2 milestone Mar 25, 2025
@basil-conto
Copy link

@basil-conto Thank you for reminding me. Would you be willing to contribute a test case for this?

Sure, but I'd appreciate some pointers: were you thinking of a test that gets httpbin to return something non-UTF-8-encoded? (Does it even support that? postmanlabs/httpbin#427)
Or just a unit test for extracting the correct charset from a Content-Type header?
Or something else?

@alphapapa
Copy link
Owner

@basil-conto Good question, of course. Ideally we'd have both a unit test for the header parsing, and a test that actually receives non-UTF8-encoded content and verifies that it is decoded properly. If httpbin doesn't make that possible, we might have to get by with a unit test for the header, and a manual test of the functionality, I guess. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants