Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIX: pip download package does not work since 21.2 #10858

Closed
1 task done
aixtools opened this issue Jan 31, 2022 · 36 comments
Closed
1 task done

AIX: pip download package does not work since 21.2 #10858

aixtools opened this issue Jan 31, 2022 · 36 comments
Labels
resolution: no action When the resolution is to not do anything

Comments

@aixtools
Copy link

aixtools commented Jan 31, 2022

Description

Recently got started with needing to update a number of packages - and ran into a problem that I could not download ansible-base (so I suspect it is an issue with any packages that are not pure Python).

So, rolled back pip to a much older version (20.2.4) and all was okay.

In increments I updated pip and 21.1.3 was the last version that worked as expected (as far as download is concerned, have not tried anything else).

Where I think the regression occurred

aixtools@x064:[/data/prj/python/git/pip]git diff 21.1.3 21.2.1 -- ./src/pip/_internal/utils/unpacking.py
diff --git a/src/pip/_internal/utils/unpacking.py b/src/pip/_internal/utils/unpacking.py
index 44ac47535..bffb3cd65 100644
--- a/src/pip/_internal/utils/unpacking.py
+++ b/src/pip/_internal/utils/unpacking.py
@@ -178,7 +178,7 @@ def untar_file(filename, location):
             filename,
         )
         mode = "r:*"
-    tar = tarfile.open(filename, mode)
+    tar = tarfile.open(filename, mode, encoding="utf-8")
     try:
         leading = has_leading_dir([member.name for member in tar.getmembers()])
         for member in tar.getmembers():

Expected behavior

  • Note: I get the same problem as above when trying to download ansible-base==2.10.16, but the ansible I have been using for two years is based on 2.10.1 - so I tried that version again.
  • I saw you are withdrawing support for py36 - so, please note, this is not a request for py36 support. I first saw this on Python3-9 which is what I wanted to update. py36 is only being used because that is known to be working - and I was looking for when the regression appeared.
(py360) aixtools@x064:[/home/aixtools/download/py360]pip download ansible-base==2.10.1
Collecting ansible-base==2.10.1
  Using cached ansible-base-2.10.1.tar.gz (6.0 MB)
Collecting jinja2
  Using cached Jinja2-3.0.3-py3-none-any.whl (133 kB)
Collecting PyYAML
  Using cached PyYAML-6.0.tar.gz (124 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
Collecting cryptography
  Using cached cryptography-36.0.1.tar.gz (572 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
Collecting packaging
  Downloading packaging-21.3-py3-none-any.whl (40 kB)
     |################################| 40 kB 44 kB/s
Collecting cffi>=1.12
  Using cached cffi-1.15.0.tar.gz (484 kB)
Collecting pycparser
  Using cached pycparser-2.21-py2.py3-none-any.whl (118 kB)
Collecting MarkupSafe>=2.0
  Downloading MarkupSafe-2.0.1.tar.gz (18 kB)
Collecting pyparsing!=3.0.5,>=2.0.2
  Downloading pyparsing-3.0.7-py3-none-any.whl (98 kB)
     |################################| 98 kB 286 kB/s
Saved ./ansible-base-2.10.1.tar.gz
Saved ./cryptography-36.0.1.tar.gz
Saved ./cffi-1.15.0.tar.gz
Saved ./Jinja2-3.0.3-py3-none-any.whl
Saved ./MarkupSafe-2.0.1.tar.gz
Saved ./packaging-21.3-py3-none-any.whl
Saved ./pyparsing-3.0.7-py3-none-any.whl
Saved ./pycparser-2.21-py2.py3-none-any.whl
Saved ./PyYAML-6.0.tar.gz

pip version

21.3.1, 21.2.4, 21.2

Python version

3.6, 3.9

OS

AIX

How to Reproduce

  1. System Admin Installs Python3-9 (or 3-6) including virtualenv
  2. Following steps as regular (no root powers)
  3. virtualenv py360
  4. mkdir -p downloads/py360
  5. . py360/bin/activate
  6. cd downloads/py360
  7. pip3 download ansible-base==2.10.1

Output

- Example:

(py360) aixtools@x064:[/home/aixtools/download/py360]pip install pip --upgrade
Requirement already satisfied: pip in /home/aixtools/py360/lib/python3.6/site-packages (21.1.3)
Collecting pip
  Using cached pip-21.3.1-py3-none-any.whl (1.7 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.1.3
    Uninstalling pip-21.1.3:
      Successfully uninstalled pip-21.1.3
Successfully installed pip-21.3.1
(py360) aixtools@x064:[/home/aixtools/download/py360]pip download ansible-base==2.10.1
Collecting ansible-base==2.10.1
  File was already downloaded /home/aixtools/download/py360/ansible-base-2.10.1.tar.gz
ERROR: Exception:
Traceback (most recent call last):
  File "/home/aixtools/py360/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 164, in exc_logging_wrapper
    status = run_func(*args)
  File "/home/aixtools/py360/lib/python3.6/site-packages/pip/_internal/cli/req_command.py", line 205, in wrapper
    return func(self, options, args)
  File "/home/aixtools/py360/lib/python3.6/site-packages/pip/_internal/commands/download.py", line 128, in run
    requirement_set = resolver.resolve(reqs, check_supported_wheels=True)
  File "/home/aixtools/py360/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 93, in resolve
    collected.requirements, max_rounds=try_to_avoid_resolution_too_deep
  File "/home/aixtools/py360/lib/python3.6/site-packages/pip/_vendor/resolvelib/resolvers.py", line 482, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
  File "/home/aixtools/py360/lib/python3.6/site-packages/pip/_vendor/resolvelib/resolvers.py", line 349, in resolve
    self._add_to_criteria(self.state.criteria, r, parent=None)
  File "/home/aixtools/py360/lib/python3.6/site-packages/pip/_vendor/resolvelib/resolvers.py", line 173, in _add_to_criteria
    if not criterion.candidates:
  File "/home/aixtools/py360/lib/python3.6/site-packages/pip/_vendor/resolvelib/structs.py", line 151, in __bool__
    return bool(self._sequence)
  File "/home/aixtools/py360/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in __bool__
    return any(self)
  File "/home/aixtools/py360/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in <genexpr>
    return (c for c in iterator if id(c) not in self._incompatible_ids)
  File "/home/aixtools/py360/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 47, in _iter_built
    candidate = func()
  File "/home/aixtools/py360/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 206, in _make_candidate_from_link
    version=version,
  File "/home/aixtools/py360/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 287, in __init__
    version=version,
  File "/home/aixtools/py360/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 156, in __init__
    self.dist = self._prepare()
  File "/home/aixtools/py360/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 225, in _prepare
    dist = self._prepare_distribution()
  File "/home/aixtools/py360/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 292, in _prepare_distribution
    return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True)
  File "/home/aixtools/py360/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 482, in prepare_linked_requirement
    return self._prepare_linked_requirement(req, parallel_builds)
  File "/home/aixtools/py360/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 528, in _prepare_linked_requirement
    link, req.source_dir, self._download, self.download_dir, hashes
  File "/home/aixtools/py360/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 223, in unpack_url
    unpack_file(file.path, location, file.content_type)
  File "/home/aixtools/py360/lib/python3.6/site-packages/pip/_internal/utils/unpacking.py", line 247, in unpack_file
    untar_file(filename, location)
  File "/home/aixtools/py360/lib/python3.6/site-packages/pip/_internal/utils/unpacking.py", line 218, in untar_file
    with open(path, "wb") as destfp:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 138-141: ordinal not in range(256)

Code of Conduct

@aixtools aixtools added S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels Jan 31, 2022
@pradyunsg pradyunsg changed the title Cannot download a module using pip download package on AIX since version 21.2 AIX: pip download package does not work since 21.2 Jan 31, 2022
@q0w
Copy link
Contributor

q0w commented Feb 1, 2022

It should be always open in PAX format, where all data is already encoded in utf-8

@aixtools
Copy link
Author

aixtools commented Feb 1, 2022

I do not claim to understand all the details of encoding, however, I know AIX is different from (most/all) Linux.

aixtools@x064:[/home/aixtools]python
Python 3.6.12 (default, Sep 23 2020, 08:27:01) [C] on aix5
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getdefaultencoding()
'utf-8'
>>> sys.getfilesystemencoding()
'iso8859-1'
  • From reading Python documentation re: encoding I understand that Python wants/prefers all text files to be encoded as utf-8.
  • TAR files are binary, not text.

@pradyunsg
Copy link
Member

This was done in #9569, to fix #7667.

@pradyunsg pradyunsg added state: needs eyes Needs a maintainer/triager to take a closer look and removed type: bug A confirmed bug or unintended behavior S: needs triage Issues/PRs that need to be triaged labels Feb 1, 2022
@aixtools
Copy link
Author

aixtools commented Feb 1, 2022

Python 3.9.9 (main, Jan 27 2022, 08:00:01) [C] on aix
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getdefaultencoding()
'utf-8'
>>> sys.getfilesystemencoding()
'iso8859-1'
>>>
(py39) aixtools@x064:[/home/aixtools/download/py39]pip3 list
Package    Version
---------- -------
pip        21.3.1
setuptools 60.2.0
wheel      0.37.1
WARNING: You are using pip version 21.3.1; however, version 22.0 is available.
You should consider upgrading via the '/home/aixtools/py39/bin/python -m pip install --upgrade pip' command.
(py39) aixtools@x064:[/home/aixtools/download/py39]pip download ansible-base==2.10.1
Collecting ansible-base==2.10.1
  Using cached ansible-base-2.10.1.tar.gz (6.0 MB)
ERROR: Exception:
Traceback (most recent call last):
  File "/home/aixtools/py39/lib/python3.9/site-packages/pip/_internal/cli/base_command.py", line 164, in exc_logging_wrapper
    status = run_func(*args)
  File "/home/aixtools/py39/lib/python3.9/site-packages/pip/_internal/cli/req_command.py", line 205, in wrapper
    return func(self, options, args)
  File "/home/aixtools/py39/lib/python3.9/site-packages/pip/_internal/commands/download.py", line 128, in run
    requirement_set = resolver.resolve(reqs, check_supported_wheels=True)
  File "/home/aixtools/py39/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 92, in resolve
    result = self._result = resolver.resolve(
  File "/home/aixtools/py39/lib/python3.9/site-packages/pip/_vendor/resolvelib/resolvers.py", line 482, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
  File "/home/aixtools/py39/lib/python3.9/site-packages/pip/_vendor/resolvelib/resolvers.py", line 349, in resolve
    self._add_to_criteria(self.state.criteria, r, parent=None)
  File "/home/aixtools/py39/lib/python3.9/site-packages/pip/_vendor/resolvelib/resolvers.py", line 173, in _add_to_criteria
    if not criterion.candidates:
  File "/home/aixtools/py39/lib/python3.9/site-packages/pip/_vendor/resolvelib/structs.py", line 151, in __bool__
    return bool(self._sequence)
  File "/home/aixtools/py39/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in __bool__
    return any(self)
  File "/home/aixtools/py39/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in <genexpr>
    return (c for c in iterator if id(c) not in self._incompatible_ids)
  File "/home/aixtools/py39/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 47, in _iter_built
    candidate = func()
  File "/home/aixtools/py39/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 201, in _make_candidate_from_link
    self._link_candidate_cache[link] = LinkCandidate(
  File "/home/aixtools/py39/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 281, in __init__
    super().__init__(
  File "/home/aixtools/py39/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 156, in __init__
    self.dist = self._prepare()
  File "/home/aixtools/py39/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 225, in _prepare
    dist = self._prepare_distribution()
  File "/home/aixtools/py39/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 292, in _prepare_distribution
    return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True)
  File "/home/aixtools/py39/lib/python3.9/site-packages/pip/_internal/operations/prepare.py", line 482, in prepare_linked_requirement
    return self._prepare_linked_requirement(req, parallel_builds)
  File "/home/aixtools/py39/lib/python3.9/site-packages/pip/_internal/operations/prepare.py", line 527, in _prepare_linked_requirement
    local_file = unpack_url(
  File "/home/aixtools/py39/lib/python3.9/site-packages/pip/_internal/operations/prepare.py", line 223, in unpack_url
    unpack_file(file.path, location, file.content_type)
  File "/home/aixtools/py39/lib/python3.9/site-packages/pip/_internal/utils/unpacking.py", line 247, in unpack_file
    untar_file(filename, location)
  File "/home/aixtools/py39/lib/python3.9/site-packages/pip/_internal/utils/unpacking.py", line 218, in untar_file
    with open(path, "wb") as destfp:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 138-141: ordinal not in range(256)
WARNING: You are using pip version 21.3.1; however, version 22.0 is available.
You should consider upgrading via the '/home/aixtools/py39/bin/python -m pip install --upgrade pip' command.

@pfmoore
Copy link
Member

pfmoore commented Feb 1, 2022

Sdists are documented as allowing Unicode filenames: "The tarball should use the modern POSIX.1-2001 pax tar format, which specifies UTF-8 based file names". This clearly makes it problematic if a filename in a sdist can't be encoded on a given target system. That's a fundamental issue, and at some level is simply a limitation of the target system.

Is there a specification anywhere for how a PAX-format tar file should be unpacked on a Unix system which has a limited character set specified? Or even a "common practice"? How does the AIX system tar utility handle ansible-base-2.10.1.tar.gz? If there's a standard approach, maybe Python's tarfile module should be using it (which would solve the problem for pip).

I don't think there's anything pip-specific to do here, the problem is at a more fundamental level than that.

One workaround would be to ask the ansible project to not use Unicode filenames in their sdists. I doubt that we want to prohibit such names in the sdist specification, because nearly all systems in common use are fine with UTF-8 filenames, but if ansible support AIX, maybe they would be open to using more AIX-friendly filenames?

@uranusjr
Copy link
Member

uranusjr commented Feb 3, 2022

This clearly makes it problematic if a filename in a sdist can't be encoded on a given target system. That's a fundamental issue, and at some level is simply a limitation of the target system.

For this case specifically, since paths on POSIX are allowed to be any arbitrary byte sequences (unlike on Windows where they are strings and have an encoding), there should be a way to handle this in theory. Whether the added complexity is worth it, and whether it’d actually achieve anything (since the resulting path likely won’t work at compile time anyway), however, is another question.

@aixtools
Copy link
Author

aixtools commented Feb 3, 2022

What characters are in position 138-141?
: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 138-141: ordinal not in range(256)

How does the AIX system tar utility handle ansible-base-2.10.1.tar.gz?

Not sure how python is dealing with this - but I just run: gzip -dc ansible-base-2.10.1.tar.gz | /usr/bin/tar -xf -

Example:

[email protected]:[/home/aixtools]e2ab195d0c2736992a3a9aeca1ddbefebee554226d211267/ansible-base-2.10.17.tar.gz                                                  <
--2022-02-03 08:23:29--  https://files.pythonhosted.org/packages/fe/56/b18bf0167aa6e2ab195d0c2736992a3a9aeca1ddbefebee554226d211267/ansible-base-2.10.17.tar.gz
Resolving files.pythonhosted.org... 151.101.53.63
Connecting to files.pythonhosted.org|151.101.53.63|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6135744 (5.9M) [application/x-tar]
Saving to: 'ansible-base-2.10.17.tar.gz'

ansible-base-2.10.17.tar.gz              100%[================================================================================>]   5.85M  --.-KB/s    in 0.09s

2022-02-03 08:23:29 (63.4 MB/s) - 'ansible-base-2.10.17.tar.gz' saved [6135744/6135744]

[email protected]:[/home/aixtools]gzip -dc ansible-base-2.10.17.tar.gz | tar tf - | wc -l
    6241
  • So it does not seem to be any issue with the tarfile, or even the name.
  • It feels like the .encode (or .decode) is just complaining, i.e., the problem does not appear to be the file, nor the filename.

@pradyunsg
Copy link
Member

pradyunsg commented Feb 3, 2022

Can you run with the current main branch (pip install https://github.com/pypa/pip/archive/refs/heads/main.zip) with PIP_DEBUG=1 in the environment?

@aixtools
Copy link
Author

aixtools commented Feb 3, 2022

  • I started with:
aixtools@x064:[/home/aixtools]/opt/aixtools/bin/virtualenv --clear py39
created virtual environment CPython3.9.9.final.0-64 in 6899ms
  creator CPython3Posix(dest=/home/aixtools/py39, clear=True, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/aixtools/.local/share/virtualenv)
    added seed packages: pip==21.3.1, setuptools==60.2.0, wheel==0.37.1
  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
aixtools@x064:[/home/aixtools]. py39/bin/activate
(py39) aixtools@x064:[/home/aixtools]pip list
Package    Version
---------- -------
pip        21.3.1
setuptools 60.2.0
wheel      0.37.1
WARNING: You are using pip version 21.3.1; however, version 22.0 is available.
You should consider upgrading via the '/home/aixtools/py39/bin/python -m pip install --upgrade pip' command.

(py39) aixtools@x064:[/home/aixtools]mkdir py39/test
(py39) aixtools@x064:[/home/aixtools]cd py39/test
(py39) aixtools@x064:[/home/aixtools/py39/test]export PIP_DEBUG=1
(py39) aixtools@x064:[/home/aixtools/py39/test]script pip-debug.txt
Script command is started. The file is pip-debug.txt.

The I ran the install as requested and tried two download commands:
pip download ansible-base and pip download ansible-base==2.10.14

script file attached.
pip-debug.txt

@pradyunsg
Copy link
Member

pradyunsg commented Feb 3, 2022

Can you not run that within script, and just paste the terminal logs into a GitHub Gist?

The script output contains various terminal escape codes, which is a lot more work for me to spend my time trying to decipher/clean up, especially for a bug on an obscure platform like AIX.

@aixtools
Copy link
Author

aixtools commented Feb 3, 2022

Can you not run that within script, and just paste the terminal logs into a GitHub Gist?

The script output contains various terminal escape codes, which is a lot more work for me to spend my time trying to decipher/clean up, especially for a bug on an obscure platform like AIX.

You mean copy/paste from the screeen - does that work?

@pradyunsg
Copy link
Member

You mean copy/paste from the screeen - does that work?

Yep! That'll work!

@aixtools
Copy link
Author

aixtools commented Feb 3, 2022

Copy/paste into a file (hope a file is okay, if not next time I'll paste here).
pip-debug.txt

@pradyunsg
Copy link
Member

You'd said you ran:

gzip -dc ansible-base-2.10.1.tar.gz | /usr/bin/tar -xf -

But in the following console session you ran:

gzip -dc ansible-base-2.10.17.tar.gz | tar tf - | wc -l

Can you confirm that extracting the archive works, using tar?

FWIW, the relevant path in the archive is test/integration/targets/unarchive/files/test-unarchive-nonascii-\u304f\u3089\u3068\u307f.tar.g… (the … denotes truncation)

@pfmoore
Copy link
Member

pfmoore commented Feb 3, 2022

Can you also confirm, if extracting the file works, can you ls it?

My suspicion is that the AIX tar is going to use the raw (UTF-8 encoded) bytes of the filename and write that to the filesystem. That's going to result in a file that's named using an encoding different from the configured system encoding.

I don't know enough about Unix systems to say that's wrong, but at a minimum, it's going to badly confuse a lot of tools...

@aixtools
Copy link
Author

aixtools commented Feb 3, 2022

You'd said you ran:

gzip -dc ansible-base-2.10.1.tar.gz | /usr/bin/tar -xf -

That was from memory - as an example

But in the following console session you ran:

gzip -dc ansible-base-2.10.17.tar.gz | tar tf - | wc -l
I just wanted to show I could download the current file and list it with no issues.

Can you confirm that extracting the archive works, using tar?
Yes, that works too.

FWIW, the relevant path in the archive is test/integration/targets/unarchive/files/test-unarchive-nonascii-\u304f\u3089\u3068\u307f.tar.g… (the … denotes truncation)

I expect the problem is the filename being recoded (I always get confused with .encode and .decode).

As unencoded string AIX open() probaly doesn't care about 'bytes' such as \u304f or \u3089, \u3068 \u307f

I tried to find what one of theses might be (the 30 part surprised me) and I finally found, for better or worse:
image
See: https://www.utf8-chartable.de/unicode-utf8-table.pl?start=12448&number=1024&addlinks=1&unicodeinhtml=dec

Anyway, the tarfile extraction:

[email protected]:[/home/aixtools]ls -l *.gz
-rw-rw-r--    1 aixtools aixtools    6135744 Jan 31 20:35 ansible-base-2.10.17.tar.gz
[email protected]:[/home/aixtools]gzip -dc ansible-base-2.10.17.tar.gz | tar xf -
[email protected]:[/home/aixtools]echo $?
0
[email protected]:[/home/aixtools]ls -ld ansible-base*17
drwxr-xr-x   11 aixtools aixtools       4096 Jan 31 20:34 ansible-base-2.10.17
[email protected]:[/home/aixtools]ls -l ansible-base-2.10.17
total 200
-rw-r--r--    1 aixtools aixtools      35148 Jan 31 20:33 COPYING
-rw-r--r--    1 aixtools aixtools       1854 Jan 31 20:33 MANIFEST.in
-rw-r--r--    1 aixtools aixtools      10008 Jan 31 20:33 Makefile
-rw-r--r--    1 aixtools aixtools       8224 Jan 31 20:34 PKG-INFO
-rw-r--r--    1 aixtools aixtools       5627 Jan 31 20:33 README.rst
-rw-r--r--    1 aixtools aixtools        917 Jan 31 20:34 SYMLINK_CACHE.json
drwxr-xr-x    2 aixtools aixtools       4096 Jan 31 20:34 bin
drwxr-xr-x    2 aixtools aixtools        256 Jan 31 20:34 changelogs
drwxr-xr-x    6 aixtools aixtools        256 Jan 31 20:34 docs
drwxr-xr-x    3 aixtools aixtools        256 Jan 31 20:34 examples
drwxr-xr-x    3 aixtools aixtools        256 Jan 31 20:34 hacking
drwxr-xr-x    3 aixtools aixtools        256 Jan 31 20:34 lib
drwxr-xr-x    2 aixtools aixtools        256 Jan 31 20:34 licenses
drwxr-xr-x    8 aixtools aixtools        256 Jan 31 20:34 packaging
-rw-r--r--    1 aixtools aixtools        361 Jan 31 20:33 requirements.txt
-rw-r--r--    1 aixtools aixtools      15418 Jan 31 20:33 setup.py
drwxr-xr-x    8 aixtools aixtools        256 Jan 31 20:34 test

@aixtools
Copy link
Author

aixtools commented Feb 3, 2022

Can you also confirm, if extracting the file works, can you ls it?

My suspicion is that the AIX tar is going to use the raw (UTF-8 encoded) bytes of the filename and write that to the filesystem. That's going to result in a file that's named using an encoding different from the configured system encoding.

I don't know enough about Unix systems to say that's wrong, but at a minimum, it's going to badly confuse a lot of tools...

I think this is what you are looking for:

[email protected]:[/home/aixtools/ansible-base-2.10.17/test/integration/targets/unarchive/files]ls | od -cox
0000000    f   o   o   .   t   x   t  \n   t   e   s   t   -   u   n   a
          063157  067456  072170  072012  072145  071564  026565  067141
            666f    6f2e    7478    740a    7465    7374    2d75    6e61
0000020    r   c   h   i   v   e   -   n   o   n   a   s   c   i   i   -
          071143  064151  073145  026556  067556  060563  061551  064455
            7263    6869    7665    2d6e    6f6e    6173    6369    692d
0000040  032 032 032 032   .   t   a   r   .   g   z  \n
          015032  015032  027164  060562  027147  075012
            1a1a    1a1a    2e74    6172    2e67    7a0a

Update:

[email protected]:[/home/aixtools/ansible-base-2.10.17/test/integration/targets/unarchive/files]gzip -dc test-unarchive-nonascii-.tar.gz | tar tf -
gzip: test-unarchive-nonascii-.tar.gz: No such file or directory
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors
[email protected]:[/home/aixtools/ansible-base-2.10.17/test/integration/targets/unarchive/files]gzip -dc test-unarchive-*.tar.gz | tar tf -
storage/
storage/a▒\200a▒\202æçe▒\201e▒\200i▒\210i▒\202o▒\202▒\223(copy)!@#$%^&-().jpg

Update 2:

[email protected]:[/home/aixtools/ansible-base-2.10.17/test/integration/targets/unarchive/files]gzip -dc test-unarchive-*.tar.gz | tar xf -
[email protected]:[/home/aixtools/ansible-base-2.10.17/test/integration/targets/unarchive/files]echo $?
0
[email protected]:[/home/aixtools/ansible-base-2.10.17/test/integration/targets/unarchive/files]ls -l storage
total 16
-rwxrwxr-x    1 aixtools aixtools       4862 Nov 12 2014  àâæçéèïîôœ(copy)!@#$%^&-().jpg
[email protected]:[/home/aixtools/ansible-base-2.10.17/test/integration/targets/unarchive/files]ls storage
àâæçéèïîôœ(copy)!@#$%^&-().jpg
[email protected]:[/home/aixtools/ansible-base-2.10.17/test/integration/targets/unarchive/files]ls storage | od -cbdx
0000000    a   ▒ 200   a   ▒ 202   ▒   ▒   c   ▒   ▒   e   ▒ 201   e   ▒
         141 314 200 141 314 202 303 246 143 314 247 145 314 201 145 314
           25036   32865   52354   50086   25548   42853   52353   26060
            61cc    8061    cc82    c3a6    63cc    a765    cc81    65cc
0000020  200   i   ▒ 210   i   ▒ 202   o   ▒ 202   ▒ 223   (   c   o   p
         200 151 314 210 151 314 202 157 314 202 305 223 050 143 157 160
           32873   52360   27084   33391   52354   50579   10339   28528
            8069    cc88    69cc    826f    cc82    c593    2863    6f70
0000040    y   )   !   @   #   $   %   ^   &   -   (   )   .   j   p   g
         171 051 041 100 043 044 045 136 046 055 050 051 056 152 160 147
           31017   08512   08996   09566   09773   10281   11882   28775
            7929    2140    2324    255e    262d    2829    2e6a    7067
0000060   \n
         012
           02560
            0a00
0000061

@aixtools
Copy link
Author

aixtools commented Feb 3, 2022

#FYI

  • What I am sure you already knew - that this would not work:
  File "/root/py39/lib/python3.9/site-packages/pip/_internal/utils/unpacking.py", line 219, in untar_file
    with open(path, "wb", encoding=sys.getfilesystemencoding()) as destfp:
ValueError: binary mode doesn't take an encoding argument
  • changing the open() call to: tar = tarfile.open(filename, mode, encoding=sys.getfilesystemencoding()) does not work for ansible-base==2.10.15 but does work for ansible-base==2.10.14

@pfmoore
Copy link
Member

pfmoore commented Feb 3, 2022

I'm sorry but I don't think this is something that pip can fix. The use of UTF-8 to read filenames in sdists is a requirement of the sdist specification, and the change in pip was to conform to that spec. The problem that pip can't write the files in the sdist to the filesystem is because Python won't write files if the filename can't be encoded in the system filesystem encoding.

There are, it seems to me, three possible solutions here:

  1. Get the ansible project to stop using Unicode filenames in their sdists, so that the sdists are non-UTF8-system friendly. That's a workaround, and doesn't actually address the issue, but is probably the only short-term solution.
  2. Revisit the sdist specification (and probably the wheel specification as well!) to document what the intended behaviour is when wheels/sdists are unpacked on systems that are configured in such a way that Python can't write files with arbitrary Unicode names. This discussion should not happen here, as it's a decision that affects more tools than just pip, so it would need to be moved to Discourse.
  3. This one, you're not going to like, but pip simply traps the current error and provides a more friendly error message, something along the lines of "sdist XXX contains a file named YYY that cannot be extracted on your system because the filesystem encoding does not support it".

Personally, I think that (3) (possibly combined with (1)) is the right solution here.

@aixtools
Copy link
Author

aixtools commented Feb 3, 2022

@pfmoore
Copy link
Member

pfmoore commented Feb 3, 2022

if you don't have a 'utf-8' compatible filesystem - go away aka, get an different OS with a supported supported filesystem.

Not what I meant. What I meant was "If you don't have a UTF-8 filesystem, the standard doesn't explain what tools should do, and there's no easy answer other than to abort. If you want a better answer, it should be documented in the standard for all tools to follow, and not be pip-specific behaviour."

I am saddened that a file included to test, ie, written to not fit non-utf8 filesystems is the insurmountable problem.

It's not insurmountable, they could create the file in the setup of the test, rather than including it directly.

To be 100% clear on why I don't view this as a pip issue, consider the following Python code:

with open("test-unarchive-nonascii-\u304f\u3089\u3068\u307f.tar.gz", "w") as f:
    f.write("Hello")

I expect that would fail on your system. If Python can't write this file, how do you expect pip to unpack the sdist? (And that's not rhetorical, if you can explain how you'd want that file to be written on your system, in a way that would work as part of the ansible source code, then we could consider how to make that the expected behaviour of tools that read sdists. Remember to consider how your proposal would work for other "limited" encodings, like ASCII or gb2312, or something truly weird like EBCDIC...)

@aixtools
Copy link
Author

aixtools commented Feb 4, 2022

  • First and foremost, thx for your time.
  • Please forgive my frustration (and now I byte my tongue)
  • For the record, your assumption is correct:
Python 3.9.9 (main, Jan 27 2022, 08:00:01) [C] on aix
Type "help", "copyright", "credits" or "license" for more information.
>>> with open("test-unarchive-nonascii-\u304f\u3089\u3068\u307f.tar.gz", "w") as f:
    f.write("Hello")...
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 24-27: ordinal not in range(256)
  • You are right, pip is not the root-cause. Neither can I say python is buggy in your example above - I am sure it is documented as unpredictable if not as sure to fail.
  • I would request that pip download at least complete the download. It has downloaded the package after all - to even fail this test. (or is there an option I am unaware of such as RPM and --nodeps.
  • And, in a week, I'll make a request to ansible-base to pack this test in a regular tar file - that the test unpacks - so that transport (i.e., download) proceeds normally.
  • I suspect ansible-base "merged" this change, or at least followed your example - which could explain why ansible-base-2.10.14 unpacks/downloads successfully with pip<=21.2 while ansible-base>=2.10.15 cannot, regardless of the pip level installed.

++++
On an unrelated matter: as I think it has to do with pip-debug the following (cleaned) control character stream:

+------------------------------- Traceback (most recent call last) --------------------------------+
|                                                                                                  |
| /home/aixtools/py39/bin/pip:8 in <module>                                                        |
|                                                                                                  |
|   5 from pip._internal.cli.main import main                                                      |
|   6 if __name__ == '__main__':                                                                   |
|   7     sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])                         |
| \u2771 8     sys.exit(main())                                                                         |
|   9                                                                                              |
|                                                                                                  |
| +------------------------------------------ locals -------------------------------------------+  |
  • The line # 8 has the line number preceded by \u2771
  • Looking at the raw stream - notice the escape sequence ^[[0m precedes the content '8', rather than after - as in the lines with '7' and '9'
�[31m|�[0m   �[2m7 �[0m    sys.argv[�[94m0�[0m] = re.sub(�[33mr�[0m�[33m'�[0m�[33m(-script�[0m�[33m\�[0m�[33m.pyw|�[0m�[33m\�[0m�[33m.exe)?$�[0m�[33m'�[0m, �[33m'�[0m�[33m'�[0m, sys.argv[�[94m0�[0m])                         �[31m|�[0m
�[31m|�[0m �[31m\u2771 �[0m8     sys.exit(main())                                                                         �[31m|�[0m
�[31m|�[0m   �[2m9 �[0m                                                                                             �[31m|�[0m
  • Note also that ^[ is being drawn as �

  • With some guidance - I'll post a new issue (or feel free to do it yourself - better than I ever could)

  • my nest guess is that thre is a .format() typo somewhere.

@pfmoore
Copy link
Member

pfmoore commented Feb 4, 2022

I would request that pip download at least complete the download.

One thing I'd not focused on is that this is pip download, not pip install. The reason we need to unpack the sdist for download is just to get the metadata. Maybe pip download --no-deps will work, but at that point it's not obvious how much benefit that gives over simply downloading the files manually.

PEP 643 might help avoid the need to unpack the whole sdist for the metadata. But uptake on that has been slow, so that's not a solution in the short to medium term.

@aixtools
Copy link
Author

aixtools commented Feb 4, 2022

  • thanks.
  • as to download (with --no-deps), maybe a new option --no-prebuilds) - as an ease of use way.
  • In the case of ansible-base - it was the first failure when I tried to down ansible, but since it never passed itself I had to go back to old installations to see what the requirements for ansible-base are, and then restarted with them, and, one-by-one work out what could be done, and what has become impossible (e.g., no rust(-lang) for AIX, so the latest cryptography is impossible. (greatly adding to my frustration above, no excuse, "byte my tongue again".
  • in short, an ease of use to have everything downloaded, but not built - or even attempted to build. That becomes my step-by-step task.
  • pure-python always seem to install - when requirements have been satisfied. And, for a platform like AIX - that noone in the python world really knows (anymore, but hopefully that will change now that Python3-9 is installed by default on AIX 7.3).
  • forgive rambling: in a word: ease of use (not having to go from web-site to web-site to get modules).

@pfmoore
Copy link
Member

pfmoore commented Feb 4, 2022

To be clear, it's not a "prebuild", it's how we get the metadata for the project, so your suggested --no-prebuilds wouldn't work (or it would be identical to --no-deps).

All I can think of is that you use (or more likely write, as I don't know of one that exists) a tool to determine, from the package dependency graph, what you need to download. And then use pip download --no-deps (or even just a simple wget) to do the downloads.

Or pin pip to 21.1, which may be the simplest option.

@aixtools
Copy link
Author

aixtools commented Feb 4, 2022

To be clear, it's not a "prebuild", it's how we get the metadata for the project, so your suggested --no-prebuilds wouldn't work (or it would be identical to --no-deps).

  • I'll expermient

All I can think of is that you use (or more likely write, as I don't know of one that exists) a tool to determine, from the package dependency graph, what you need to download. And then use pip download --no-deps (or even just a simple wget) to do the downloads.

Or pin pip to 21.1, which may be the simplest option.

  • I went and readup on PEP517, and found the description/specification:
The generated tarball should use the modern POSIX.1-2001 pax tar format, which specifies UTF-8 based file names.
invalid=action
    (Applicable only to the -x pax format.) This keyword allows user control over the action pax takes upon encountering values in an extended header record that, in read or copy mode, are invalid in the destination hierarchy or, in list mode, cannot be written in the codeset and current locale of the implementation. The following are invalid values that shall be recognized by pax:

        In read or copy mode, a filename or link name that contains character encodings invalid in the destination hierarchy. (For example, the name may contain embedded NULs.)

        In read or copy mode, a filename or link name that is longer than the maximum allowed in the destination hierarchy (for either a pathname component or the entire pathname).

        In list mode, any character string value (filename, link name, user name, and so on) that cannot be written in the codeset and current locale of the implementation.

    The following mutually-exclusive values of the action argument are supported:

    binary
        In write mode, pax shall generate a hdrcharset=BINARY extended header record for each file with a filename, link name, group name, owner name, or any other field in an extended header record that cannot be translated to the UTF-8 codeset, allowing the archive to contain the files with unencoded extended header record values. In read or copy mode, pax shall use the values specified in the header without translation, regardless of whether this may overwrite an existing file with a valid name. In list mode, pax shall behave identically to the bypass action.
    bypass
        In read or copy mode, pax shall bypass the file, causing no change to the destination hierarchy. In list mode, pax shall write all requested valid values for the file, but its method for writing invalid values is unspecified.
    rename
        In read or copy mode, pax shall act as if the -i option were in effect for each file with invalid filename or link name values, allowing the user to provide a replacement name interactively. In list mode, pax shall behave identically to the bypass action.
    UTF-8
        When used in read, copy, or list mode and a filename, link name, owner name, or any other field in an extended header record cannot be translated from the pax UTF-8 codeset format to the codeset and current locale of the implementation, pax shall use the actual UTF-8 encoding for the name. If a hdrcharset extended header record is in effect for this file, the character set specified by that record shall be used instead of UTF-8. If a hdrcharset=BINARY extended header record is in effect for this file, no translation shall be performed.
    write
        In read or copy mode, pax shall write the file, translating the name, regardless of whether this may overwrite an existing file with a valid name. In list mode, pax shall behave identically to the bypass action.

    If no -o invalid=option is specified, pax shall act as if -o invalid=bypass were specified. Any overwriting of existing files that may be allowed by the -o invalid= actions shall be subject to permission ( -p) and modification time (-u) restrictions, and shall be suppressed if the -k option is also specified.
  • Maybe using the PAX utility to unpack a tarball rather than tar - allows the platform to deal with invalid situations.
  • When pax is not available the fallback could be tar
  • Idea?

AIX 6.1 (so quite old!) pax manpage excerpt (reformatted for legibility)

       Item
            Description
       -o Options (Continued)
            invalid=action (Applicable only to the -x pax format.)

            This keyword allows user control over the action pax takes upon encountering values in an extended header record that:
              *    in read or copy mode, are invalid in the destination hierarchy, or
              *    in list mode, cannot be written in the code set and current locale.
            pax recognizes these invalid values:
              *    In read or copy mode, a file name or link name that contains character encodings invalid in the destination hierarchy. (For example, the name
                   may contain embedded NULLs.)
              *    In read or copy mode, a file name or link name that is longer than the maximum allowed in the destination hierarchy (for either a path name
                   component or the entire path name).
              *    In list mode, any character string value (file name, link name, user name, and so on) that cannot be written in the code set and current
                   locale.
            These mutually exclusive values of the action argument are supported:
              *    bypass

                   In read or copy mode, pax bypasses the file, causing no change to the destination hierarchy. In list mode, pax writes all requested valid
                   values for the file, but its method for writing invalid values is unspecified.
              *    rename

                   In read or copy mode, pax acts as if the -i flag is in effect for each file with invalid file name or link name values, allowing the user to
                   provide a replacement name interactively. In list mode, pax behaves identically to the bypass action.
              *    UTF8

                   When used in read, copy, or list mode and a file name, link name, owner name, or any other field in an extended header record cannot be
                   translated from the pax UTF8 code set format to the current code set and locale, pax uses the actual UTF8 encoding for the name.
              *    write

                   In read or copy mode, pax writes the file, translating or truncating the name, regardless of whether this may overwrite an existing file with
                   a valid name. In list mode, pax behaves identically to the bypass action.

                   If no -o invalid=action is specified, pax acts as if the bypass action is specified. Any overwriting of existing files that may be allowed by
                   the -o invalid=actions is subject to permission (-p) and modification time (-u) restrictions, and is suppressed if the -k flag is also
                   specified.

@pradyunsg
Copy link
Member

pradyunsg commented Feb 4, 2022

For --no-deps, pip will, still unpack the downloaded distribution. See #1884.

@aixtools
Copy link
Author

aixtools commented Feb 11, 2022

Thx. On vacation.

Question: is it an idea to use pax utility with arguments, rather than (g)tar? since it is the pax norm that is specified, not tar. AND, more importantly, pax is coded to deal with file-system-encoding exceptions.

@pfmoore
Copy link
Member

pfmoore commented Feb 11, 2022

Internally pip uses the stdlib tarfile module. We're not going to shell out to an external program for this, as that adds a whole bunch of risk that we don't want to take.

@aixtools
Copy link
Author

hmm. feels like they missed something when the pep covering this was written specifying pax conformity, again, since pax was specified to cover the requirement while tar was not.

beginning to feel like no win. wonder how it is going to be looked at down the road (since python 3.9 is part of aix 7.3 bos). not your problem, nor mine. just a hassle to me atm.

thx for the feedback.

@aixtools
Copy link
Author

p.s. maybe there is something in https://github.com/python/cpython/blob/3.10/Lib/tarfile.py (e.g.) and pax_headers=xxx, but study exceeds my vacation time for comments. :)

@pfmoore
Copy link
Member

pfmoore commented Feb 11, 2022

feels like they missed something when the pep covering this was written specifying pax conformity

The pax conformity required by the pep is provided by the tarfile module. I remember we explicitly checked this when we wrote the PEP. The problem here is nothing to do with the format, it's because some files that can be stored in pax format, can't be extracted if the filesystem/os encoding doesn't support writing files with that name.

The AIX pax utility appears to do something at bytes level for un-encodable filenames which is not part of the pax format specification.

@aixtools
Copy link
Author

well - I am on vacation - so cannot test anything atm. But what should I see to verify that the pax utility is not working? If it isn't doing what the man page has stated - for years - re: how to react to filenames that do not encode to iso-8859-1 aka latin1, then that is a bug - and I'll try to find a way to get it submitted as such to IBM.

The pax conformity required by the pep is provided by the tar module - yet I recall reading that people expected more issues with file systems that do not support utf-8 natively.

I only hope to find a solution - if it is a bug in AIX that needs addressing (pax implementation is wrong) - I'll go for it, but the tar module exceeds my current comprehension of pax and python (I merely try to package python and modules, not develop).

I just fear, down the road, several issues for AIX and python - and expect them to be more noticeable because it is included in AIX 7.3 (and my role as packager goes away).

Bottom line: if I can help - I'll do what I can - and ask for your understanding when it (helping) is beyond my abilities.

@pfmoore
Copy link
Member

pfmoore commented Feb 11, 2022

I don't think there's much to do here. My reading of the situation is as follows:

  1. The sdist specification supports filenames using any Unicode character in sdists. But it doesn't say anything about how tools should react when asked to unpack a sdist that contains a filename that Python can't create on the target system.
  2. Python relies on the locale settings on Unix when encoding bytes for creating files in the filesystem (this is arguably wrong if you take a hardline "filenames are bytes, not characters" stance, but it's standard Python behaviour).
  3. Pip expects to be able to unpack sdists to build/install them. It also currently needs to unpack them as part of pip download, Avoid generating metadata in pip download --no-deps ... #1884 will remove this requirement for the specific case of pip download --no-deps.
  4. The vast majority of sdists don't use "unusual" Unicode characters in filenames, so this is a rare situation. Ansible is the only project I know of that does it.

Configuring an AIX system to use UTF-8 as the filesystem encoding (like most Linux systems do these days) would remove the issue, but it's a global change that would have much wider implications. Asking the ansible project to avoid Unicode filenames might be a practical workaround, too.

It's unlikely pip will be modified to handle this case, it's basically far too rare, and there's no reasonable behaviour that would be guaranteed safe. We don't want to get into those sort of heuristics - we follow the behaviour of Python's open function here as that's well-defined and well-known.

I'm inclined to close this as "no action" / "out of scope" (basically "won't fix", but we don't have a label for that) because I don't think there's any change to pip that we'd be willing to make to handle this.

@pradyunsg pradyunsg added resolution: no action When the resolution is to not do anything and removed state: needs eyes Needs a maintainer/triager to take a closer look labels Feb 11, 2022
@pradyunsg
Copy link
Member

I pretty much agree with @pfmoore here, so I've gone ahead and applied the R: no action label here; and am going to close this.

AIX is not a platform that pip supports anyway (https://pip.pypa.io/en/stable/installation/#compatibility). I'll take this as a good reminder to update our docs to note that pip working on unsupported platforms (eg: AIX) is considered incidental and "it runs != it is supported".

@aixtools
Copy link
Author

Configuring an AIX system to use UTF-8 as the filesystem encoding (like most Linux systems do these days) would remove the issue, but it's a global change that would have much wider implications. Asking the ansible project to avoid Unicode filenames might be a practical workaround, too.

  • IMHO: unlikely that IBM would make this change and risk breaking binary compatability

It's unlikely pip will be modified to handle this case, it's basically far too rare, and there's no reasonable behaviour that would be guaranteed safe. We don't want to get into those sort of heuristics - we follow the behaviour of Python's open function here as that's well-defined and well-known.

  • understood: Windows is a supported system, and AIX is not. So even though earlier changes anticipated problems eventually AIX is still in a limbo area - not quite supported.

I'm inclined to close this as "no action" / "out of scope" (basically "won't fix", but we don't have a label for that) because I don't think there's any change to pip that we'd be willing to make to handle this.

  • AS Python is part of AIX 7.3 base - I hope you (pypa, Python) won't actively oppose seeing AIX as a supported platform. I know I tried for years to meet as many of the demands as I could - but I was not IBM enough to satisify anyone, and bored with with always hearing about AIX's limbo status - I stopped working on platform issues.
  • As always, I wish you the best and thank you for what you have done ( ;) just not for AIX)
  • Also thanks for the expanded clarification.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
resolution: no action When the resolution is to not do anything
Projects
None yet
Development

No branches or pull requests

5 participants