Fix: (plz--coding-system) Extract charset from media type #66

josephmturner · 2024-08-21T07:19:16Z

Previously, plz--coding-system always returned nil since coding-system-from-name expects a coding system name string like "UTF-8".

basil-conto

Just pinging in support of this patch because plz does not currently decode non-UTF-8 responses, despite claiming to.

To illustrate with a real example, plz--coding-system fails to extract the correct charset from a header like (content-type . "text/javascript; charset=ISO-8859-1").

basil-conto · 2025-03-17T16:47:05Z

plz.el

-      (coding-system-from-name content-type))))
+  (when-let* ((headers (or headers (plz--headers)))
+              (raw-content-type (alist-get 'content-type headers))
+              (content-type (downcase raw-content-type))


Alternatively, you could bind case-fold-search around the string-match; might save a string allocation.

Yet more alternatively (for more brownie points?), did you consider using the built-in mail-parse library? See (info "(emacs-mime) Interface Functions"). For example:

(require 'mail-parse) (mail-content-type-get (mail-header-parse-content-type "text/javascript; charset=ISO-8859-1") 'charset) ;; => "ISO-8859-1"

Alternatively, you could bind case-fold-search around the string-match; might save a string allocation.

Thanks, that's a good idea. @josephmturner Would you like to add that, or should I?

Yet more alternatively (for more brownie points?), did you consider using the built-in mail-parse library?

No, I wasn't aware of that function. Given the simplicity of this scenario, I think I'd prefer to not have to load that library. Looking at the rfc2231 library it is aliased to, while that library is certainly useful, the function rfc2231-parse-string which gets called looks large, complicated, and would probably be overkill for this. I can't imagine it would help performance to use that function, either. Thanks for pointing it out, though. :)

I can't imagine it would help performance to use that function

Neither does making HTTP requests ;).

alphapapa · 2025-03-25T23:15:10Z

Just pinging in support of this patch because plz does not currently decode non-UTF-8 responses, despite claiming to.

To illustrate with a real example, plz--coding-system fails to extract the correct charset from a header like (content-type . "text/javascript; charset=ISO-8859-1").

@basil-conto Thank you for reminding me. Would you be willing to contribute a test case for this? That would make it easy to apply this as a fix to the stable release branch.

basil-conto · 2025-03-26T10:16:18Z

@basil-conto Thank you for reminding me. Would you be willing to contribute a test case for this?

Sure, but I'd appreciate some pointers: were you thinking of a test that gets httpbin to return something non-UTF-8-encoded? (Does it even support that? postmanlabs/httpbin#427)
Or just a unit test for extracting the correct charset from a Content-Type header?
Or something else?

alphapapa · 2025-03-27T22:48:39Z

@basil-conto Good question, of course. Ideally we'd have both a unit test for the header parsing, and a test that actually receives non-UTF8-encoded content and verifies that it is decoded properly. If httpbin doesn't make that possible, we might have to get by with a unit test for the header, and a manual test of the functionality, I guess. :)

Fix: (plz--coding-system) Extract charset from media type

326a18f

Previously, plz--coding-system always returned nil since coding-system-from-name expects a coding system name string like "UTF-8".

basil-conto approved these changes Mar 17, 2025

View reviewed changes

alphapapa self-assigned this Mar 25, 2025

alphapapa added the bug Something isn't working label Mar 25, 2025

alphapapa added this to the v0.9.2 milestone Mar 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: (plz--coding-system) Extract charset from media type #66

Fix: (plz--coding-system) Extract charset from media type #66

josephmturner commented Aug 21, 2024

basil-conto left a comment

basil-conto Mar 17, 2025 •

edited

Loading

basil-conto Mar 17, 2025 •

edited

Loading

alphapapa Mar 25, 2025

alphapapa Mar 25, 2025

basil-conto Mar 26, 2025

alphapapa commented Mar 25, 2025

basil-conto commented Mar 26, 2025

alphapapa commented Mar 27, 2025

Fix: (plz--coding-system) Extract charset from media type #66

Are you sure you want to change the base?

Fix: (plz--coding-system) Extract charset from media type #66

Conversation

josephmturner commented Aug 21, 2024

basil-conto left a comment

Choose a reason for hiding this comment

basil-conto Mar 17, 2025 • edited Loading

Choose a reason for hiding this comment

basil-conto Mar 17, 2025 • edited Loading

Choose a reason for hiding this comment

alphapapa Mar 25, 2025

Choose a reason for hiding this comment

alphapapa Mar 25, 2025

Choose a reason for hiding this comment

basil-conto Mar 26, 2025

Choose a reason for hiding this comment

alphapapa commented Mar 25, 2025

basil-conto commented Mar 26, 2025

alphapapa commented Mar 27, 2025

basil-conto Mar 17, 2025 •

edited

Loading

basil-conto Mar 17, 2025 •

edited

Loading