-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test failures due to .es file name suffix for text/javascript file #75
Comments
I have filed https://bugs.debian.org/1061477 "Use js as first file name suffix for text/javascript", however Debian 12 bookworm (current stable release) is affected, so a workaround at the side of applications is desired. |
Thank you for the information and report. During the investigation I found something bizarre. According to Python source code,
When loading any of the above map, This seems to be true for Python 3.12 to 3.7, and I currently cannot figure out why your system doesn't do so... Can you provide your Python implementation for further investigation? (or better, check the source code) |
Prior to Python 3.12 it was
I was disappointed by first win vs. last win discrepancy as well. My first idea was to suggest Perhaps it is possible to extend |
Sorry, I did not check versions < 3.12 carefully enough and failed to notice that it's I wonder the last-win and first-win behavior is implemented as how most mime.types like files work. Unfortunately the native I implemented a quick patch that may have fixed the issue, and feel free to check if it does the job. Though it's still suboptimal as it accesses the hidden |
Danny, have you considered the following workaround?
I have tried the following: #!/usr/bin/env python3
import mimetypes
import os.path
user_mime_types = os.path.join(os.path.expanduser('~'), '.mime.types')
pkg_mime_types = os.path.join(os.path.dirname(__file__), 'mime.types')
def _init(files):
mimetypes.knownfiles[:] = files + mimetypes.knownfiles.copy() + list(reversed(files))
mimetypes.init()
_init([user_mime_types, pkg_mime_types])
if __name__ == '__main__':
import sys
for typ in sys.argv[1:]:
all = mimetypes.guess_all_extensions(typ)
print(f'{typ}: {all}')
for ext in all:
name = 'stub' + ext
print(f'{ext}: {mimetypes.guess_type(name)}') It reports
However with
it gives desired result
So the only complication is shipping and extra resource file. |
There are already known issues in the Python default mapping and Windows registry, which are loaded before the knownfiles. Although they are ext=>MIME cases and can be overwritten, we can't guarantee there won't be a MIME=>exe case in the future. The solution shoild be able to cover such potential cases, I think. |
Could you, please, be more specific? Do you mean the following?
I have reread types_map_overrides = {
'.js': 'text/javascript',
'.es': 'text/ecmascript'
}
def _init2(files=None, overrides=None):
# First wins for media type to extension mapping.
if overrides:
mimetypes.types_map.update(overrides)
# Load Windows registry and POSIX /etc/mime.types.
mimetypes.init()
if files:
mimetypes.init(files)
# Ensure stable mapping of extension to media type.
if overrides:
for ext, typ in overrides.items():
mimetypes.add_type(typ, ext)
_init2([user_mime_types], types_map_overrides) It seems, first wins for media type to file name suffix is a bug:
|
I'm considering the similar approach. See the latest commits. |
See the code comments, some patches are explicitly for such issues.
I'm not very sure about what "those named before it" means. But I think raising an issue to the Python developers for this should be OK. |
Danny, f47648f breaks tests because |
Have you tested it? This shouldn't happen as it will be overwritten again by the add_type. |
Unfortunately, there is an issue for the "patch default types_map" approach: This approach won't work if Another problem is how should we deal with the user mime config. Should we allow it to overwrite our patch for native Python and OS configs? Should we allow it to overwrite MIME=>ext mapping as the last-win manner? Applying the patch and user config when |
Reworked, see the latest commits. It seems that patching the internal _db is still the easiest and safest way to do the job. Please check if it works on your device. |
The rabbit hole is deeper than I expected. I do not think you need tricks with
(A side note. I do not like the I still prefer the
In my opinion, ideally the priorities should be the following (from higher to lower):
As a workaround for the first wins bug in the case of media types to extension mapping, perhaps it is possible to use |
First all all: do you confirm whether the current commit works on your device?
I forgot to revise the related code comments, which is for the previous "patch default types_map" implementation, which permanently changes the default map and can not be reset without reloading the module, and is no longer valid for the current "patch _db" implementation. However, reloading the module is still required since in Python < 3.7.5 there are no default maps and any loaded config files permanently change the default maps, as well as
I don't know what is the exact issue you care about, before you elaborate it. The referenced issue is so old and no longer valid since Python 3.7.5, I think.
It is not quite easy to ensure that all modules possibly call
Reading |
Yes, I do
f15fa0e 2024-01-26 00:22:37 +0800 Danny Lin: Rework mimetypes handling
It is sour if you still have to support 3.7. Accordingly to https://devguide.python.org/versions/ that branch reached end of life status on 2023-06-27. |
Good. We probably are going on this way.
I know this. But we would avoid dropping previously supported versions too early. Also, currently the implementation is one-time, i.e. not inherited by new |
Forgot to say, we don't need to patch |
Ack. I see usage of |
Reloading the module may be important to test namely patching of mapping. Otherwise However it does not really matter and I agree that the approach, I suggest, is a trade-off. It allows to avoid usage private API, but the price is config files processed twice. It may be reconsidered later if another issue will be found. |
This is not true. Try this in Python 3.7.4:
Apparently re-running Also, as mentioned before: our polyfill is one-time, and reinit will remove all our patches. Although we do can rewrite the script to make it apply again during reinit, we currently don't see a strong rationale to do so, with the cost of complicating the code. Intercepting with the hidden attribute is not uncommon for a patch or polyfill script. Would try to avoid it, but only when there's no too much extra cost. Slightly updated the code, which should be the final version unless there is a strong real world use case that suggests a further change. |
I do not see any issue for the application or if you are going to test result of The real issue is that behavior is different from newer versions. I do not like that there are no way to specify directly what Another problem is that Python-3.7.3 does not support incremental I do not think it is necessary to add workarounds against code invoking I have made another attempt to use purely public API. The result is fragile, but it seems it works in 3.11 and 3.7.3. It assumes that wsb custom mappings have lower priority than local configuration files. class MimeTypesSafeRead(mimetypes.MimeTypes):
"""Constructor `file` argument may contain missed files"""
def read(self, filename, strict=True):
"""Ignore inaccessible files"""
try:
f = open(filename, encoding='utf-8')
except OSError:
return None
with f:
self.readfp(f, strict)
def _init(files=None, overrides=None):
"""A workaround for first wins for media type to extension mapping"""
if files is not None and not len(files):
files = None
if overrides is not None and not len(overrides):
overrides = None
if files is None and overrides is None:
return
override_types_map = None
saved_types_map_default = None
saved_types_map = None
# `MimeTypes` constructor uses `mimetypes.types_map in Python-3.7.3
# and `mimetypes._types_map_default` in 3.11.
# Fragile, `mimetypes.init()` decouples `mimetypes.types_map`
# and `mimetypes._types_map_default` e.g. in Python-3.11.
internal_types_map = mimetypes.types_map
try:
saved_knownfiles = None
try:
saved_types_map = mimetypes.types_map.copy()
# Build combined `types_map` for `files` and `overrides`.
# `MimeTypes` constructor below may call `mimetypes.init()`.
# Suppress loading of system files, it will be done later.
# There is no way to avoid loading of Widnows registry.
if not mimetypes.inited:
saved_knownfiles = mimetypes.knownfiles.copy()
mimetypes.knownfiles.clear()
mimetypes.init()
saved_types_map_default = internal_types_map.copy()
# Avoid module and system defaults in the override map.
internal_types_map.clear()
mimetypes.types_map.clear()
if overrides:
internal_types_map.update(overrides)
mimetypes.types_map.update(overrides)
override_types_map = MimeTypesSafeRead(files or ()).types_map[True]
# Set overrides before loading system settings.
internal_types_map.clear()
mimetypes.types_map.clear()
internal_types_map.update(saved_types_map_default)
internal_types_map.update(override_types_map)
mimetypes.types_map.update(internal_types_map)
if not overrides:
override_types_map = None
finally:
if saved_knownfiles:
mimetypes.knownfiles[:] = saved_knownfiles
# Modify `knownfiles` to get at least some overrides if some code
# will call `mimetypes.init()` later.
if files:
mimetypes.knownfiles.extend(files)
# Replaces `mimetypes.types_map`.
mimetypes.init()
saved_types_map = None
finally:
# Something goes wrong. Restore initial state.
if saved_types_map_default:
internal_types_map.clear()
internal_types_map.update(saved_types_map_default)
if saved_types_map:
mimetypes.types_map.clear()
mimetypes.types_map.update(saved_types_map)
mimetypes.init()
# Widows registry or system files may overwrite entries from `overrides`.
# Python < 3.7.5 does not support incremental `init()`
# bpo-4963 https://github.com/python/cpython/issues/49213
# So it is impossible to apply `overrides` in between of loading
# of system configuration and `files`
# and it is necessary to apply combined overrides instead.
if override_types_map:
for ext, typ in override_types_map.items():
mimetypes.add_type(typ, ext) |
- Fix an issue that native (Python and platform) MIME=>ext conversions are not patched. - Allow user config to overwrite the patch. - Patch mimetypes.init() to defer patching and improve the performance, and allow unittest mockings for the user config directory to work for the mimetypes module.
I'm going to end this issue. Our goal is to fix the issue of instability caused by some bad platform rules, while maintain the largest compatibility with previous versions. We are not going to complicate things up unless there is a significant real world use case. After all, the user can always fix any mimetype conversion issue by modifying the platform rules directly or by adding a little code in the launcher script (such as the |
Thank you, Danny. My attempt to make media type to extension mapping configurable out of the box: |
The following tests may fail:
tests.test_scrapbook_indexer.TestUnSingleHtmlConverter.test_rewrite_svg
tests.test_scrapbook_indexer.TestUnSingleHtmlConverter.test_rewrite_svg_file
Originally I mentioned it in #34 (comment)
I think the origin of the issue is
due to
It is specific to Debian Linux and its derivatives
https://salsa.debian.org/debian/media-types/-/commit/f67681aabbce05378e0ac45f92d398f8b7a31d5e
while RedHat-based distributions (Fedora, etc.) are not affected
https://pagure.io/mailcap/blob/master/f/mime.types#_1990
While I believe that it was oversight of the Debian media-types project to put
es
suffix beforejs
, PyWebScrapBook should either explicitly settext/javascript
to.js
mapping or should allow.es
in tests.The text was updated successfully, but these errors were encountered: