Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] uv publish --skip-existing #7917

Closed
Tracked by #7839
chassing opened this issue Oct 4, 2024 · 29 comments · Fixed by #8531
Closed
Tracked by #7839

[feat] uv publish --skip-existing #7917

chassing opened this issue Oct 4, 2024 · 29 comments · Fixed by #8531
Assignees
Labels
enhancement New feature or improvement to existing functionality

Comments

@chassing
Copy link

chassing commented Oct 4, 2024

It would be great if uv publish had a --skip-existing option that ignores errors from files already on PyPI, similar to poetry's and twine's --skip-existing options.

@konstin
Copy link
Member

konstin commented Oct 4, 2024

Do you have a specific case that fails? uv publish already ignores it if you try to upload the same file again.

@chassing
Copy link
Author

chassing commented Oct 4, 2024

Mmmmhhh strange, I got this error:

10:57:30 uv build
10:57:30 Building source distribution...
10:57:30 Building wheel from source distribution...
10:57:30 Successfully built dist/rh_aws_saml_login-0.3.4.tar.gz and dist/rh_aws_saml_login-0.3.4-py3-****-any.whl
10:57:30 uv publish
10:57:30 warning: `uv publish` is experimental and may change without warning
10:57:30 Publishing 2 files https://upload.pypi.org/legacy/
10:57:30 Uploading rh_aws_saml_login-0.3.4-py3-****-any.whl (3.7KiB)
10:57:30 error: Failed to publish `dist/rh_aws_saml_login-0.3.4-py3-****-any.whl` to https://upload.pypi.org/legacy/
10:57:30   Caused by: Upload failed with status code 400 Bad Request: 400 File already exists ('rh_aws_saml_login-0.3.4-py3-****-any.whl', with blake2_256 hash '6c36e28fc15545d4e3d10a0c68e9d476afdb6ae2facdbb846a10098516575d0d'). See https://pypi.org/help/#file-name-reuse for more information.
10:57:30 make: *** [Makefile:39: pypi] Error 2

@chassing
Copy link
Author

chassing commented Oct 4, 2024

TBH, it's not exactly the same file, it's the lastest master, but it's the same version.

@konstin
Copy link
Member

konstin commented Oct 4, 2024

Could you give some more context on why you're doing the second upload? In the final uploaded version, would the source distribution and the wheel be from different commits?

@chassing
Copy link
Author

chassing commented Oct 4, 2024

I'm sorry. Our development and release flow is the following: Users create PRs, and each PR is merged into the main branch. The main branch is automatically built and published to PyPI. As long nobody bumps the version number in pyproject.toml, we don't want to publish (release) a new version. We used poetry in the past, and the --skip-existing option did the job.

@TiansuYu
Copy link

TiansuYu commented Oct 4, 2024

Do you have a specific case that fails? uv publish already ignores it if you try to upload the same file aga

No it doesn't. UV will try to upload all pre-existing ones.

e.g. if I have published version 1.0.0, now wants to build and publish 1.0.1 locally, it it will try to upload 1.0.0 as well. I have to manually remove the older versions to make it work.

@konstin
Copy link
Member

konstin commented Oct 4, 2024

e.g. if I have published version 1.0.0, now wants to build and publish 1.0.1 locally, it it will try to upload 1.0.0 as well. I have to manually remove the older versions to make it work.

Are these 1.0.0 files the same that are on the index, and when uploading, do you get a "already exists but ok" message or do you get an error?

@davidszotten
Copy link
Contributor

for the ci workflow @chassing mentioned (which is similar to what we use) the files aren't necessarily identical. they point is to use "this file already exists in the index" as a shorthand for "version hasn't changed, no need to upload" (by treating that specific error as being ok). i ported the twine approach to poetry, it's based on some heuristics https://github.com/pypa/twine/blob/ae71822a3cb0478d0f6a0cccb65d6f8e6275ece5/twine/commands/upload.py#L58-L72 so a bit hacky, but it works very well in practice

@TiansuYu
Copy link

TiansuYu commented Oct 4, 2024

So I can verify that if the file is identical, the uv publish will pass, but will break if the code changes. And there is no info on "already exists but ok"
Screenshot 2024-10-04 at 17 21 48
uv 0.4.18 (7b55e9790 2024-10-01)

@charliermarsh charliermarsh added the needs-decision Undecided if this should be done label Oct 8, 2024
@charliermarsh
Copy link
Member

@konstin -- Where did we land on this?

@zanieb
Copy link
Member

zanieb commented Oct 8, 2024

It seems problematic to try to upload different content and expect it to be treated the same way as uploading an identical file? It also seems reasonable to opt-in to ignoring failures in that case? Unless we want to add a command to check if a matching version exists or something? But that exposes you to TOCTOU bugs.

@charliermarsh
Copy link
Member

Yeah agree, which would lead us to adding a skip existing flag IIUC.

@davidszotten
Copy link
Contributor

(if you decide to accept the feature request i'd be interested in having a go at implementing it (with some pointers))

@konstin
Copy link
Member

konstin commented Oct 9, 2024

With pypi today, a new release is created once you upload the first file for that release. Uploading additional files with the same package name and version number but a different filename adds them to the existing release. That is suboptimal: A release is published once a single file exists even if the wheels the users want are still missing, and it means that if something in your release process fails, you have a release with half the files. The pypi admins are also aware that this is undesirable, and there is work going on to fix this (https://discuss.python.org/t/pep-694-upload-2-0-api-for-python-package-repositories/16879).


Separately, unfortunately it seems that different implementations differ in how they react to reuploading files

  • PyPI: When uploading the same file again, you get a 200 OK with an empty body. When uploading a different file, you get an error. We can’t tell apart whether an upload was successful or skipped.
  • GitLab: When uploading a file with the same name, you get a “400 Bad request - Validation failed: File name has already been taken”, no matter whether the file was identical or not.
  • Codeberg/forgejo: Error 409 “package file already exists”, no matter whether the file was identical or not.

The current skip existing behavior was based on PyPI’s behavior, while copying twine response behavior for compatibility with alternative indexes (the ones @davidszotten mentioned) as default without parameters (misunderstanding how those behave for non-pypi), resulting in a strange mix of behaviours.


Skip existing packages in its basic form is necessary: If a CI publish task fails mid-way uploading files, the user should be able to restart the whole workflow, with only new files being reuploaded.

Preferably, we’d error when we try to upload files that have the same name when uploading, but different content, otherwise the following scenario could: You build foo 1.2.3 into foo-1.2.3.tar.gz and foo-1.2.3-cp39-abi3-manylinux2014.whl, but upload only the source distribution foo-1.2.3.tar.gz (let’s say, because they wheel build failed or because the CI machine crashed after the source dist upload). Now you make some code changes, rebuild foo 1.2.3 and try uploading foo-1.2.3.tar.gz and foo-1.2.3-cp39-abi3-manylinux2014.whl. foo-1.2.3.tar.gz already exists, so only foo-1.2.3-cp39-abi3-manylinux2014.whl gets uploaded. Due to the code changes, the published source distribution and wheel mismatch: A mac or windows user building from the source distribution will get different code than the linux user, breaking the contract (and core assumption in packaging tools) that all files in a release match.

One option is adding index url as mandatory parameter to skip-existing: By using the hashes in the index URL, we check before uploading whether a file with the same filename does not yet exist (new upload), a file with the same name and same content/hash exists (we can skip it, such as a 1.0 that is still present in your dist/ when you're now publishing 1.1) or a file with the same name but different content exists (we error). The disadvantage is that this is more complex both on the user side and the uv side than twine's --skip-existing.

@konstin
Copy link
Member

konstin commented Oct 9, 2024

e.g. if I have published version 1.0.0, now wants to build and publish 1.0.1 locally, it it will try to upload 1.0.0 as well. I have to manually remove the older versions to make it work.

We could take a similar approach as poetry publish here and require the project's pyproject.toml to be present (either in the current directory or as part of the workspace), then only upload the version defined there. This would change CI workflows in some cases because it requires a checkout step for the publish step (e.g. if you build multiple wheels for different platforms, then collect and upload in a separate job, that job needs a checkout too).

@davidszotten
Copy link
Contributor

i agree that the current implementations are a bit of a hack (and from your detailed description, perhaps even dangerous). just a note that the proposal of checking version against pyproject.toml doesn't satisfy mine (or op's i think) use-case of using skip-existing as a proxy for "was the version just bumped". (i'm open to ideas of other ways of achieving this though)

@konstin konstin added the needs-design Needs discussion, investigation, or design label Oct 9, 2024
@TiansuYu
Copy link

TiansuYu commented Oct 9, 2024

In my workflow, I would also need to generate a dev version on the fly and publish a random dev version on every merge to main. Please also keep this in mind, when you devise uv publish. (In the end, probably just edit version in pyproject.toml in CD, and not interfere the workflow @konstin proposed here.)

@konstin
Copy link
Member

konstin commented Oct 9, 2024

@TiansuYu Could you expand on your overall workflow, i.e. how does the wheel get its random dev tag and where/how does this wheel from main get used?

@TiansuYu
Copy link

TiansuYu commented Oct 9, 2024

I would somehow edit the version string e.g. 1.0.0 -> 1.0.0.dev0 (using a script or something, or if uv can offer something like uv version dev kinda like poetry version patch but append a random version string.) Then run uv build and uv publish. This is what I mean.

@konstin
Copy link
Member

konstin commented Oct 12, 2024

For skip existing we can use the following design:

The user can pass --keep-existing <index url>. For each file given to upload, uv checks whether the file is already on the index. If the filename does not exist, uv will upload the file. If the filename does exist and the file on the index is the exact same as the local file (hash match), we skip the upload as there's nothing to do. If the file exists and mismatches with different content, we error: There is an inconsistency, and uv does not upload in this case.

The upload itself may then error or succeed. If it errors, we check the index URL again: Maybe there was a parallel upload of the same file, and the other upload was quicker (avoid TOCTOU errors). We then apply the same logic: If the filename does exist and the file on the index is the same (hash match), it's ok, there was an upload race. If the file exists and mismatches, we error: There is an inconsistency, and uv does not upload in this case. I'm not sure yet if we should do the second check only if the error is one of the twine-recognized file-already-existed conditions; i tend to think we can just do the second check on any error.

This enables re-trying publish, but it still requires the author of a package to ensure that they only try to publish each version from a specific commit. We impose this requirement to avoid situations where different files in a version were built from different sources, which breaks a fundamental assumption in uv.

@konstin konstin added enhancement New feature or improvement to existing functionality and removed needs-decision Undecided if this should be done needs-design Needs discussion, investigation, or design labels Oct 13, 2024
@JohnPaton
Copy link

Popping in to say I'm affected by this. We run a monorepo and all packages are built when a tag is pushed, but the tag doesn't change the version of all packages because of prefixing. So it tries to publish all built packages, but not all of them have a new version. twine's skip existing means that only the genuine changes are published. uv in its current state gives

$ uv publish
warning: `uv publish` is experimental and may change without warning
Publishing 10 files https://index.company.com/...
Uploading my_pkg-0.2.1.dev136+g27f686d-py3-none-any.whl (2.4KiB)
error: Failed to publish `dist/my_pkg-0.2.1.dev136+g27f686d-py3-none-any.whl` to https://index.company.com/...
  Caused by: Upload failed with status code 403 Forbidden. Server says: {
  "errors" : [ {
    "status" : 403,
    "message" : "Not enough permissions to delete/overwrite artifact 'index:my_pkg/0.2.1.dev136+g27f686d/my_pkg-0.2.1.dev136+g27f686d-py3-none-any.whl' (user: 'user' needs DELETE permission)."
  } ]
}

I could build a script to check which build are changed compared to the index and only publish those ones, but whyh would I when twine upload --skip-existing exists.

@charliermarsh
Copy link
Member

Did you look at --check-index support? #8531

@fwallacevt
Copy link

Hey @charliermarsh we are impacted by this as well - --check-url is surprising in how it works in that it will error if the file exists and does not match the local hash. This is fundamentally different than how we expect something akin to --skip-existing to work. With --check-url, we get an error error: Local file and index file do not match. The expected behavior is that if the filename exists in the index we've pointed uv to, it should exit successfully.

@charliermarsh
Copy link
Member

@fwallacevt -- I hear you, though that is an intentional design decision. We think it should be an error if you're trying to upload a package to an index and the contents differ from the existing version. It's a sign that something is off in the build or publish process.

@fwallacevt
Copy link

@charliermarsh we're using uv workspaces, and right now we run uv build ... --all, then publish everything that is built. What would you suggest as an alternative to only build packages if they should be built (e.g. version has changed)? Or is there another way you envision this workflow?

@konstin
Copy link
Member

konstin commented Nov 26, 2024

I'd recommend explicitly selecting the package(s) that changed and you want to publish. I know it's more effort than having the index figure it out, but it avoids publishing inconsistent sets of distributions for a version.

@JohnPaton
Copy link

That's our use-case too @fwallacevt... I guess we'll stick with twine for just the publishing step.

@fwallacevt
Copy link

@konstin @charliermarsh are you open to making this an optional feature? E.g. adding a flag that dictates whether a conflict results in a warning or an error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or improvement to existing functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants