-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: (plz--coding-system) Extract charset from media type #66
base: master
Are you sure you want to change the base?
Fix: (plz--coding-system) Extract charset from media type #66
Conversation
Previously, plz--coding-system always returned nil since coding-system-from-name expects a coding system name string like "UTF-8".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pinging in support of this patch because plz
does not currently decode non-UTF-8 responses, despite claiming to.
To illustrate with a real example, plz--coding-system
fails to extract the correct charset from a header like (content-type . "text/javascript; charset=ISO-8859-1")
.
(coding-system-from-name content-type)))) | ||
(when-let* ((headers (or headers (plz--headers))) | ||
(raw-content-type (alist-get 'content-type headers)) | ||
(content-type (downcase raw-content-type)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, you could bind case-fold-search
around the string-match
; might save a string allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet more alternatively (for more brownie points?), did you consider using the built-in mail-parse
library? See (info "(emacs-mime) Interface Functions")
. For example:
(require 'mail-parse)
(mail-content-type-get
(mail-header-parse-content-type "text/javascript; charset=ISO-8859-1")
'charset)
;; => "ISO-8859-1"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, you could bind
case-fold-search
around thestring-match
; might save a string allocation.
Thanks, that's a good idea. @josephmturner Would you like to add that, or should I?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet more alternatively (for more brownie points?), did you consider using the built-in
mail-parse
library?
No, I wasn't aware of that function. Given the simplicity of this scenario, I think I'd prefer to not have to load that library. Looking at the rfc2231
library it is aliased to, while that library is certainly useful, the function rfc2231-parse-string
which gets called looks large, complicated, and would probably be overkill for this. I can't imagine it would help performance to use that function, either. Thanks for pointing it out, though. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't imagine it would help performance to use that function
Neither does making HTTP requests ;).
@basil-conto Thank you for reminding me. Would you be willing to contribute a test case for this? That would make it easy to apply this as a fix to the stable release branch. |
Sure, but I'd appreciate some pointers: were you thinking of a test that gets |
@basil-conto Good question, of course. Ideally we'd have both a unit test for the header parsing, and a test that actually receives non-UTF8-encoded content and verifies that it is decoded properly. If httpbin doesn't make that possible, we might have to get by with a unit test for the header, and a manual test of the functionality, I guess. :) |
Previously, plz--coding-system always returned nil since coding-system-from-name expects a coding system name string like "UTF-8".