-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AIX: pip download package
does not work since 21.2
#10858
Comments
pip download package
on AIX since version 21.2pip download package
does not work since 21.2
It should be always open in PAX format, where all data is already encoded in utf-8 |
I do not claim to understand all the details of encoding, however, I know AIX is different from (most/all) Linux.
|
|
Sdists are documented as allowing Unicode filenames: "The tarball should use the modern POSIX.1-2001 pax tar format, which specifies UTF-8 based file names". This clearly makes it problematic if a filename in a sdist can't be encoded on a given target system. That's a fundamental issue, and at some level is simply a limitation of the target system. Is there a specification anywhere for how a PAX-format tar file should be unpacked on a Unix system which has a limited character set specified? Or even a "common practice"? How does the AIX system tar utility handle I don't think there's anything pip-specific to do here, the problem is at a more fundamental level than that. One workaround would be to ask the ansible project to not use Unicode filenames in their sdists. I doubt that we want to prohibit such names in the sdist specification, because nearly all systems in common use are fine with UTF-8 filenames, but if ansible support AIX, maybe they would be open to using more AIX-friendly filenames? |
For this case specifically, since paths on POSIX are allowed to be any arbitrary byte sequences (unlike on Windows where they are strings and have an encoding), there should be a way to handle this in theory. Whether the added complexity is worth it, and whether it’d actually achieve anything (since the resulting path likely won’t work at compile time anyway), however, is another question. |
What characters are in position 138-141?
Not sure how python is dealing with this - but I just run: Example:
|
Can you run with the current |
The I ran the install as requested and tried two download commands: script file attached. |
Can you not run that within script, and just paste the terminal logs into a GitHub Gist? The script output contains various terminal escape codes, which is a lot more work for me to spend my time trying to decipher/clean up, especially for a bug on an obscure platform like AIX. |
You mean copy/paste from the screeen - does that work? |
Yep! That'll work! |
Copy/paste into a file (hope a file is okay, if not next time I'll paste here). |
You'd said you ran:
But in the following console session you ran:
Can you confirm that extracting the archive works, using tar? FWIW, the relevant path in the archive is |
Can you also confirm, if extracting the file works, can you My suspicion is that the AIX tar is going to use the raw (UTF-8 encoded) bytes of the filename and write that to the filesystem. That's going to result in a file that's named using an encoding different from the configured system encoding. I don't know enough about Unix systems to say that's wrong, but at a minimum, it's going to badly confuse a lot of tools... |
That was from memory - as an example
I expect the problem is the filename being recoded (I always get confused with .encode and .decode). As unencoded string AIX open() probaly doesn't care about 'bytes' such as \u304f or \u3089, \u3068 \u307f I tried to find what one of theses might be (the 30 part surprised me) and I finally found, for better or worse: Anyway, the tarfile extraction:
|
I think this is what you are looking for:
Update:
Update 2:
|
#FYI
|
I'm sorry but I don't think this is something that pip can fix. The use of UTF-8 to read filenames in sdists is a requirement of the sdist specification, and the change in pip was to conform to that spec. The problem that pip can't write the files in the sdist to the filesystem is because Python won't write files if the filename can't be encoded in the system filesystem encoding. There are, it seems to me, three possible solutions here:
Personally, I think that (3) (possibly combined with (1)) is the right solution here. |
|
Not what I meant. What I meant was "If you don't have a UTF-8 filesystem, the standard doesn't explain what tools should do, and there's no easy answer other than to abort. If you want a better answer, it should be documented in the standard for all tools to follow, and not be pip-specific behaviour."
It's not insurmountable, they could create the file in the setup of the test, rather than including it directly. To be 100% clear on why I don't view this as a pip issue, consider the following Python code: with open("test-unarchive-nonascii-\u304f\u3089\u3068\u307f.tar.gz", "w") as f:
f.write("Hello") I expect that would fail on your system. If Python can't write this file, how do you expect pip to unpack the sdist? (And that's not rhetorical, if you can explain how you'd want that file to be written on your system, in a way that would work as part of the ansible source code, then we could consider how to make that the expected behaviour of tools that read sdists. Remember to consider how your proposal would work for other "limited" encodings, like ASCII or gb2312, or something truly weird like EBCDIC...) |
++++
|
One thing I'd not focused on is that this is PEP 643 might help avoid the need to unpack the whole sdist for the metadata. But uptake on that has been slow, so that's not a solution in the short to medium term. |
|
To be clear, it's not a "prebuild", it's how we get the metadata for the project, so your suggested All I can think of is that you use (or more likely write, as I don't know of one that exists) a tool to determine, from the package dependency graph, what you need to download. And then use Or pin pip to 21.1, which may be the simplest option. |
AIX 6.1 (so quite old!)
|
For |
Thx. On vacation. Question: is it an idea to use |
Internally pip uses the stdlib tarfile module. We're not going to shell out to an external program for this, as that adds a whole bunch of risk that we don't want to take. |
hmm. feels like they missed something when the pep covering this was written specifying pax conformity, again, since pax was specified to cover the requirement while tar was not. beginning to feel like no win. wonder how it is going to be looked at down the road (since python 3.9 is part of aix 7.3 bos). not your problem, nor mine. just a hassle to me atm. thx for the feedback. |
p.s. maybe there is something in https://github.com/python/cpython/blob/3.10/Lib/tarfile.py (e.g.) and pax_headers=xxx, but study exceeds my vacation time for comments. :) |
The pax conformity required by the pep is provided by the tarfile module. I remember we explicitly checked this when we wrote the PEP. The problem here is nothing to do with the format, it's because some files that can be stored in pax format, can't be extracted if the filesystem/os encoding doesn't support writing files with that name. The AIX pax utility appears to do something at bytes level for un-encodable filenames which is not part of the pax format specification. |
well - I am on vacation - so cannot test anything atm. But what should I see to verify that the The pax conformity required by the pep is provided by the tar module - yet I recall reading that people expected more issues with file systems that do not support utf-8 natively. I only hope to find a solution - if it is a bug in AIX that needs addressing (pax implementation is wrong) - I'll go for it, but the tar module exceeds my current comprehension of pax and python (I merely try to package python and modules, not develop). I just fear, down the road, several issues for AIX and python - and expect them to be more noticeable because it is included in AIX 7.3 (and my role as packager goes away). Bottom line: if I can help - I'll do what I can - and ask for your understanding when it (helping) is beyond my abilities. |
I don't think there's much to do here. My reading of the situation is as follows:
Configuring an AIX system to use UTF-8 as the filesystem encoding (like most Linux systems do these days) would remove the issue, but it's a global change that would have much wider implications. Asking the ansible project to avoid Unicode filenames might be a practical workaround, too. It's unlikely pip will be modified to handle this case, it's basically far too rare, and there's no reasonable behaviour that would be guaranteed safe. We don't want to get into those sort of heuristics - we follow the behaviour of Python's I'm inclined to close this as "no action" / "out of scope" (basically "won't fix", but we don't have a label for that) because I don't think there's any change to pip that we'd be willing to make to handle this. |
I pretty much agree with @pfmoore here, so I've gone ahead and applied the AIX is not a platform that pip supports anyway (https://pip.pypa.io/en/stable/installation/#compatibility). I'll take this as a good reminder to update our docs to note that pip working on unsupported platforms (eg: AIX) is considered incidental and "it runs != it is supported". |
|
Description
Recently got started with needing to update a number of packages - and ran into a problem that I could not download ansible-base (so I suspect it is an issue with any packages that are not pure Python).
So, rolled back pip to a much older version (20.2.4) and all was okay.
In increments I updated pip and 21.1.3 was the last version that worked as expected (as far as download is concerned, have not tried anything else).
Where I think the regression occurred
Expected behavior
ansible
I have been using for two years is based on 2.10.1 - so I tried that version again.py36
is only being used because that is known to be working - and I was looking for when the regression appeared.pip version
21.3.1, 21.2.4, 21.2
Python version
3.6, 3.9
OS
AIX
How to Reproduce
Output
Code of Conduct
The text was updated successfully, but these errors were encountered: