Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Housekeeping #17

Merged
merged 39 commits into from
Aug 19, 2023
Merged

Housekeeping #17

merged 39 commits into from
Aug 19, 2023

Conversation

exhuma
Copy link

@exhuma exhuma commented Aug 14, 2023

This PR contains some opinionated housekeeping changes (#16). Feel free to add
counter-arguments to the proposed changes. As they are subject of personal
preference, the opinions may vary and I am open to discussion.

Please consider this PR a draft proposal.

Actionable points in this PR (discussed below):

  • Which license? ISC or MIT? Before this PR the project used both.
  • Is mainapp okay as name for the extra-dependency?
  • Who has responsibility for the generation of the requirements.txt file?
    The CI or the maintainer? Note that they may be different on each OS, so
    generating them in the CI might be beneficial.

Reasons why I chose to make these changes

Dropping Pipfile and setup.py in Favor of pyproject.toml

By moving everything into pyproject.toml, we remove duplication. And thus
also remove the risk of diverging dependencies. This was in fact already the
case. Looking at Pipfile I found the following divergence from setup.py:

  • gunicorn (and gevent) was missing in setup.py. This makes sense as it
    is only required when running httpbin as a main process. I added an
    extra-dependency called mainapp to cover this use-case. This makes it
    installable as runnable application using pip install httpbin[mainapp]. We
    might rename that extra to something else though.
  • six was missing in setup.py. It has been added to pyproject.toml
  • pyyaml, rope and meinheld have no imports anywhere. They have been
    dropped.
  • werkzeug had a minimum version in Pipfile, but not in setup.py. This
    has been merged into pyproject.toml
  • The License information stored in setup.py mismatched with the LICENSE
    file (ISC vs MIT). This has now been consolidated. It might be necessary to
    review which of the two is actually the correct one
    .

This has the main disadvantage that the convenience commands from pipenv are
no longer available. They can be useful for managing virtual-envs and adding
new dependencies to the project.

In my opinion (and experience), those use-cases don't surface often in a
project. And the benefits of having everything in one place (i.e.
pyproject.toml) outweigh those rare conveniences.

pip-tools instead of Pipfile

Dropping pipenv also removes the possibility to use the pipenv lock
command. This can be easily reproduced using pip-compile from the pip-tools
package. It has the advantage of producing a standard requirements.txt file
(instead of the Pipfile.lock file).

Removal of the VERSION file

This file caused an import side-effect. It is accessed as soon as httpbin is
imported to provide the httpbin.version variable. In addition, reading the
contents of that file does not necessarily mean that it is the installed
package in case there are bugs in the project.

It is better to reach into the project meta-data from the environment.

Since Python 3.8 (backport exists), importlib.metadata.version can handle
this just fine. And works well with pyproject-build. This PR makes use of
this, also including the backport for older Python versions.

Docker Build Automation & requirements.txt

The timing of the generation of the "lock" file (independent of pipenv or
pip-tools) is currently / with this PR left to the maintainer by running
pip-compile --upgrade --generate-hashes and committing the result.

The requirements file could also be dropped from the repo and generated
"on-the-fly" during the CI run. In that case it is important to ensure that the
unit-tests are run with the same file.

An example multi-stage CI flow (using "needs") has been added to the repo. This
could be used to generate the needed files.

The requirements.txt file is only needed when running httpbin as main
application. When it is used as a library in another project, the
responsibility of generating the requirements file falls to that project. So,
as a result, it is not really necessary to package the requirements.txt file
and could very well be generated on-the-fly when generating the docker image.

Publishing Via GitHub actions

The new proposed workflow contains a job to publish to pypi. All that is
required is to create a workflow secret with the name PYPI_TOKEN.

Other Side-Effects and Notes

  • The test_suite key from setup.py is lost. This is - in my opinion -
    negligable.
  • The include_package_data key was dropped because it is true by default.
  • Some CI actions have been upgraded to get rid of some warning messages.
  • Removed the job-timeout in the CI workflow (Flaky CI runs #15). Some jobs ended up hitting
    this timeout, causing runs to fail seemingly at random.

@exhuma
Copy link
Author

exhuma commented Aug 14, 2023

Additional note: The proposed CI file builds and published the docker-image automatically into the GitHub container registry (ghcr). This is done mainly for illustrative purposes. This can be fairly easily modified to push to docker-hub. I chose GHCR as it does not require special authentication. Or rather, the authentication is handled by GitHub.

@kevin1024
Copy link

Wow! Thank you for the work on this. Packaging has really moved on lately.

Which license? ISC or MIT? Before this PR the project used both.

They seem functionally identical. Let's do MIT since it's more common.

Is mainapp okay as name for the extra-dependency?

Works for me

Who has responsibility for the generation of the requirements.txt file?

I've always done maintainer-controlled requirements but I'm good with CI if that simplifies releases.

pip install httpbin[mainapp]

Should this be in the dockerfile somehow? Is the Docker image getting gunicorn included?

Speaking of simplifying releases, I think this PR could use a little documentation in the README:

  1. How dependencies work (mainapp vs library)
  2. How to push a new release (asking selfishly; I assume it's no longer setup.py build bdist_wheel && twine upload dist/*
  3. requirements.txt generation instructions for maintainers
  4. Docker image location: I don't have access to kennethreitz/httpbin in dockerhub so perhaps we should update to the new github container registry location

All in all, I agree with the decisions you have made in the PR.

@exhuma
Copy link
Author

exhuma commented Aug 15, 2023

Something that would make packaging easier is if we could limit it to two officially supported use-cases:

  • Installation as library via pip
  • Running as standalone-process via docker

This would completely remove the need for a "lock" file. Projects using it as library do not need that anyway already, and having an official docker image has all dependencies locked into that image anyway.

Maintaining standalone apps in Python is a bit tricky as you never know the state of the end-user system. One of my colleagues at work maintains one such project and it is constantly giving us headaches. We are thinking of using FlatPak or Appimage to package and distribute the application in the future to have more reproducible installations.

@kevin1024 What are your thoughts on this? Would you be okay of limiting the official distribution to the two use-cases mentioned above?

exhuma added 2 commits August 15, 2023 10:37
References psf#17

The docker-image has all dependencies "frozen" in place, so a
requirements.txt is not really needed.

It would be difficult (but possible) to ensure that the docker-image
uses the same `requirements.txt` as the one that was published. It adds
a considerable overhead to the build complexity and is not really
needed. Supporting multiple end-user environments (and
operating-systems) would also require *multiple* such files. And even
then it is not guaranteed to work everywhere.

Limiting the officially supported targets to "library" and "docker"
removes those complexities.
@exhuma
Copy link
Author

exhuma commented Aug 15, 2023

I've made a bunch of changes and adressed the above comments. I added a new section "Maintenance" to the README which should help.

With this PR, releases would be automatically triggered when the following conditons are met:

  • A tag with the name release-*
  • A "push" event (to avoid accidental release during other events). This could be extended to also include webhooks if a release needs to be re-triggered without code-changes. But in my experience, it is better to also increment the version (f.ex. with a "post" suffix) to avoid overwriting anything that has already been released. See ci.yml#L78
  • [For pypi and docker-hub] the correct secrets are defined in the project settings

Concerning the docker-image location: the workflow is currently configured to use the GitHub registry. But using docker-hub would be beneficial. It makes the "docker pull" command a tiny bit easier for end-users.

@exhuma
Copy link
Author

exhuma commented Aug 15, 2023

Concerning the docker-image location: the workflow is currently configured to use the GitHub registry. But using docker-hub would be beneficial. It makes the "docker pull" command a tiny bit easier for end-users.

Actually.... scratch that... I have updated the README with the approprioate docker image-names (using the GitHub registry). It's not that much more difficult.

I also provided a "latest" tag to allow pulling an image without specifying a tag.

I do not know if the GHCR has any limitations compared to docker-hub. And, after all this fiddling around I'm not really in the mood to dig into that topic.... so there's that :)

@kevin1024
Copy link

Would you be okay of limiting the official distribution to the two use-cases mentioned above?

I am 100% OK with that. There may be other maintainers on this project that feel otherwise though.

@nateprewitt, @sigmavirus24, and @timofurrer if you have opinions feel free to weigh in; otherwise I'm happy to keep this project moving. I just don't want to overstep.

@sigmavirus24
Copy link

This all works for me. And thank you for the valuable input. This is why I marked is as a draft and it gives me new topics to look into and brush up on.

I fully agree with only keeping direct dependencies. I was not diligent enough by verifying them all when I merged the existing Pipfile and setup.py. I'll pass over them again.

Concerning the requirements.txt file: I feel like there might be a learning opportunity for me here. Considering that the "executable" package is provided via a docker image, and the library is provided by pypi; in which case would an end-user need the requirements.txt? If I just want to run httpbin I can use the docker image. If I want to add it to my project, I just add httpbin to my dependencies. In neither case do I need requirements.txt. From my understanding, that file is useful to get reproducible builds. But they are already "baked into" to the docker-image. All I can see is the need to re-build a new docker image from the same state. But does that happen often enough to add another "moving part" into the build process?

I'm not against using a requirements file. I'm just trying to avoid adding something that does not bring large enough added value. But maybe I am not seeing something. In which use-case would this become useful?

If the requirements.txt file is auto-updated, we would also need to rebuild the docker container each time that happens. Or else both states drift apart. If we do want to auto-rebuild the docker image each time the requirements file updates we also need to auto-generate version numbers. They could be "post" releases I suppose. This also risks generating a lot of releases

So here's what I'm thinking about specifically with a requirements.txt. Let's say we have a requirements-unpinned.txt. That holds the top-level dependencies we care about:

Flask
flassger
importlib-metadata
six

We can pip-compile that into a requirements.txt file that we then consistently test against which includes the pinned transitive dependencies. Then our pyproject.toml could allow library users to just have those dependencies that they install at whatever version they care about.

That then allows that if we release a Docker image today. And tomorrow the base image we rely on has a CVE reported in it, we could trigger a new build with the same dependencies and an updated base layer to allow users to not have to continue to have a CVE in some portion of their stack (test, or whatever) that is vulnerable to something (anything) because more and more people and companies are reporting on this kind of data. Users shouldn't have to wait for us to release a new version in an unknown period of time that has a new base image.

Alternatively, we don't need to auto-publish a docker image. Simply having the dockerfile could be sufficient for those users. But I want to be able to know that if we build the image today and then a week today with the only intended change being the base image, we would have the same tested dependency configuration (unless we explicitly chose to upgrade that) between the two. Otherwise, some transitive dependency could update and break everything without us noticing between those two image publishes and we'd never know.

@exhuma
Copy link
Author

exhuma commented Aug 17, 2023

@sigmavirus24 I see what you mean. In this case we could also directly use pip-compile on the pyproject.toml. You mentioned earlier that requirements.txt is harder to break. The build-system I chose in this MR is setuptools which does follow PEP-621. And I postulate that the likelihood of that format changing is not much higher than a change in requirements.txt. Both formats are already firmly entrenched and can't change much. It would prevent us from having to keep two files up-to-date with the same information.

I'm still trying to figure out how best to handle the "reproducible build" problem for the docker image that you mention.

The CI jobs run on images provided by GitHub (f.ex. ubuntu-latest). The docker image is build using a different base image (f.ex. python:3.10-slim). So in very rare cases, it is possible that both would generate different requirements files. I think the chances are small enough that we can accept that risk so I will take that route.

I will also see that I can commit that change during the CI run. This way, both the requirements.txt and Dockerfile would be up-to-date and in line with the generated docker-image on the GitHub repo. I was thinking of generating an automated PR instead of directly committing, but this adds more work for maintainers. And it is something that can safely be pushed back.

I'll work on that later today.

MANIFEST.in Outdated Show resolved Hide resolved
@exhuma
Copy link
Author

exhuma commented Aug 17, 2023

The last commit provides a requirements.txt file as "release file". As I had
plenty of reflections going through my head when writing this, here they are
so you can follow that process:

The requirements is generated early (using the pyproject.toml file as
source). As the file is intended to be used to get a "runnable" main
application, it also pulls in the extra mainapp. But it leaves out the test
extra.

The file is then used to bootstrap the environment for unit-testing. As the
test extra is left out it is installed in a separate step. pip will detect
all the "already present" dependencies and skip those. There is a small
possibility that the "test" extra has transient dependencies that are in
conflict with the requirements file. I tried adding a --no-deps to the
installation of the test-extra but that didn't work. If this issue ever arises
it could be solved with a constraints file. For now, I decided to accept the
risk in favor of keeping it simple.

If the above issue arises, it only means that the unit-tests are running
against incorrect dependencies. There is no risk that we accidentally package
wrong versions (f.ex. in cases of CVEs).

The "docker build" step then uses that same requirements.txt file. This
ensures that the unit-tests ran against the same libraries as we have in the
image.

Finally, after everything passed, and the commit is tagged as release, both the
Dockerfile and requirements.txt file are published under the GitHub
releases. This provides an easy download-location and way to rebuild the image
as discussed in #17. They also don't take up any considerable disk-space.

These "release-file" are only provided as secondary way to build the image
manually. The main releases are still the finished docker-image and pypi
package.

I also added an appropriate section to the README file.

LICENSE Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Comment on lines 51 to 54
sudo apt-get update
sudo apt-get install -y python3-pip
sudo pip install --upgrade pip
sudo pip install yq

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these necessary? Also, why do we need sudo here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yq contains tomlq and is a wrapper around jq which makes it very easy to extract meta-data from a TOML file. I use this to fill-in docker labels during build. This allows us to keep all that meta-data in one single place (the pyproject.toml).

Instead of sudo I can switch to a venv instead. I will still need sudo to install the python3-venv package then. Since Python 3.11, pip install raises an error, even when doing a "user-install". So a venv will be necessary. I will make that change and leave this thread unresolved in case you have additional feedback.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to using a venv has been applied.

Dockerfile Outdated
COPY --from=build /opt/httpbin /opt/httpbin
ADD httpbin.bash /opt/httpbin/bin
RUN chmod +x /opt/httpbin/bin/httpbin.bash
EXPOSE 80

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're breaking the API of where to get the image from, right? If we're doing that, why not also not use root and expose a different port. People can always remap them on the outside as they wish.

Would include adding a USER httpbin and running the service in the bash script with a different port but that should be fine.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the same thing but wanted to keep the external changes (of the running image) to a minimum.

If you are okay with that change I will gladly do it as it will simplify deployments in non-privileged docker runtimes (like OpenShift) which enforce non-root images.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed the necessary changes in 4923332

If we decide to keep the old external behaviour (port 80 by default) we can easily revert that commit. With this change, the HTTPBIN_PORT customisation that I provided in #4 becomes moot and we could drop that again. It messes with the EXPOSE information of the image anyway and was only added as a workaround to make the image run on environments that enforce non-root containers.

Should I remove that again? Only one other user expressed interest and it was only released for a short time. I'm generally against dropping something once it's released. But maybe we could make an exception here?

@exhuma
Copy link
Author

exhuma commented Aug 17, 2023

I saw the failures on Windows. I'm currently away and can look at them at the earliest tomorrow. But in general they are unrelated to the proposed changes. Just some PowerShell shenanigans.

@exhuma
Copy link
Author

exhuma commented Aug 18, 2023

The cleanup of the dependencies I did recently removed a dependency which was needed after-all. The image built fine without it, but the process did not start. That dependency (greenlet) now causes issues on Python 3.12. I will have to look into that in detail.

I will also look into a way to test a proper startup of the container in the CI to catch theses issues early.

@exhuma
Copy link
Author

exhuma commented Aug 19, 2023

The Python 3.12 issues come from greenlet (see python-greenlet/greenlet#363).

The fix is however only in a pre-release/alpha of greenlet. To limit the risk of making a major version bump (and to an alpha version at that), I only apply that change for Python 3.12 which is itself in pre-release as well via 715d6f7

Once greenlet has release a full/final 3.x release it would be good to simplify these dependencies.

@exhuma exhuma marked this pull request as ready for review August 19, 2023 12:02
@exhuma
Copy link
Author

exhuma commented Aug 19, 2023

From what I can tell there are no big issues left in this PR. The biggest question would be the list of maintainers. I don't know who would qualify as maintainer. People with write access to the pypi repo maybe?

The other open questions are "fine-polish" in my opinion. All should be fine even without addressing them. They can always be handled down the road even after the PR is done. Even the maintainer-question can be handled later. The current release mentions Kenneth alone anyway. So it's not like its removing/breaking anything existing.

As a consequence I woundn't mind if we could put a bow on this PR and finish it up.

@kevin1024
Copy link

Looks great! Thanks to @exhuma for all your work and to @sigmavirus24 and @nateprewitt for your review!

@exhuma
Copy link
Author

exhuma commented Aug 20, 2023

I too want to thank @sigmavirus24 for the in-depth and detailed review.

@exhuma
Copy link
Author

exhuma commented Aug 20, 2023

@kevin1024 in case you need further help, feel free to reach out to me. Either via my committer e-mail or just by assigning an issue to me.

@exhuma exhuma deleted the housekeeping branch August 20, 2023 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants