Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run website in a container via make container-serve due to the image absence #49460

Closed
shurup opened this issue Jan 16, 2025 · 32 comments · Fixed by #49610 or #49624
Closed

Can't run website in a container via make container-serve due to the image absence #49460

shurup opened this issue Jan 16, 2025 · 32 comments · Fixed by #49610 or #49624
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/docs Categorizes an issue or PR as relevant to SIG Docs. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@shurup
Copy link
Member

shurup commented Jan 16, 2025

This is a Bug Report

Problem:

Running make container-serve in kubernetes/website:main (to run the website locally in a container) leads to an error:

Unable to find image 'gcr.io/k8s-staging-sig-docs/k8s-website-hugo:v0.133.0-af5f894e895c' locally
docker: Error response from daemon: manifest for gcr.io/k8s-staging-sig-docs/k8s-website-hugo:v0.133.0-af5f894e895c not found: manifest unknown: Failed to fetch "v0.133.0-af5f894e895c" from request "/v2/k8s-staging-sig-docs/k8s-website-hugo/manifests/v0.133.0-af5f894e895c".
See 'docker run --help'.
make: *** [Makefile:119: container-serve] Error 125

It happens on Linux/amd64. This behaviour is confirmed by a few people.

Proposed Solution:

Following the #49444 discussion, it should be fixed not by prior executing make container-image (to build images locally), but by the availability of Hugo images that can be pulled from GCR instead. As @sftim noted in Slack, they might be absent due to a recent Docsy upgrade. We need to have them back.

@shurup shurup added kind/bug Categorizes issue or PR as related to a bug. sig/docs Categorizes an issue or PR as relevant to SIG Docs. labels Jan 16, 2025
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 16, 2025
@niranjandarshann
Copy link
Contributor

niranjandarshann commented Jan 16, 2025

I am also facing the same issue.

@sftim
Copy link
Contributor

sftim commented Jan 16, 2025

/triage accepted
/priority important-soon

Only affects contributors, not website visitors, but we should get a fix in place; we may need to revisit how we build and publish the container image(s)

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 16, 2025
@sftim sftim pinned this issue Jan 17, 2025
@sftim
Copy link
Contributor

sftim commented Jan 21, 2025

Until we fix this, there are two workarounds you can use:

  • reference an older container image
    make container-serve CONTAINER_IMAGE=gcr.io/k8s-staging-sig-docs/k8s-website-hugo:v0.133.0-a5ef70d3da97
  • build your own image
    make container-image # only needed one time
    make container-serve

If you like helping out, SIG Docs can guide you towards working on a longer term fix. The best place to offer help is the #sig-docs channel on Slack.

@SayakMukhopadhyay
Copy link
Contributor

SayakMukhopadhyay commented Jan 22, 2025

Got the link to the failing prow job on the 15th Jan https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/post-website-push-image-k8s-website-hugo/1879257326494420992

A cursory glance makes me think that the issue might have something to do with npm and package.json but I will need to look into it more.

EDIT: Maybe its related to ed1cc81

@sftim
Copy link
Contributor

sftim commented Jan 22, 2025

We should check that the container build works for AArch64 (ideally: we also add some CI checks to reject PRs that break image builds)

I think it builds fine for AMD64, but maybe only for AMD64. In the cloud we build a multiarch image.

@ameukam
Copy link
Member

ameukam commented Jan 23, 2025

@sftim
Copy link
Contributor

sftim commented Jan 23, 2025

I (still) suspect that a local AArch64 build would also fail.

@ameukam
Copy link
Member

ameukam commented Jan 24, 2025

Yeah. I see:

#24 84.50 npm error npm error ld-linux-aarch64.so.1: /root/.npm/_cacache/tmp/git-cloneXXXXXXMFelHl/node_modules/hugo-extended/vendor/hugo: Not a valid dynamic program
#24 84.50 npm error npm error ✖ Hugo installation failed. :(
#24 84.50 npm error npm error node:internal/errors:984
#24 84.50 npm error npm error   const err = new Error(message);

Looks like an issue with NPM rather than the infrastructure. Possible the alpine image no longer have this lib for aarch64

@SayakMukhopadhyay
Copy link
Contributor

SayakMukhopadhyay commented Jan 30, 2025

I was able to reproduce this locally when building a multi-arch build. So, I did some testing and found that this issue is not AArch64 related rather it's Alpine+AArch64 related. The issue is that the hugo-extended binary just doesn't run on Alpine + AArm64. It's not the hugo-extended npm library's fault as all it does is download the binary and attempts to execute it once. This attempt is what causes the npm ci to fail. And this failure is a sign of a bigger problem.

Thing is, onlyh the hugo binary works in Alpine+AArch64 whereas both hugo and hugo-extended binaries work in Debian+AArch64. Looking into the binaries, here's what the file command gives:

  • hugo-extended-amd64

hugo: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, Go BuildID=diiSa0Qe8VOXD3Ch_MwU/k9QnCktDCLZbp9fOWtYK/kXU2EiN95-OVeBmcnn3D/TED5qvYgX7CVfxfU1NBx, BuildID[sha1]=bb899409ab5a900e5105fae2bbf6ce0dd9b09f3d, stripped

  • hugo-extended-aarch64

hugo: ELF 64-bit LSB executable, ARM aarch64, version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 3.7.0, Go BuildID=1Tr54E5QnsvcLulJcEu8/zbqdZMB13x98JUbG7mGA/jPKlLSBJNWtSLfS0TLy_/s-5yF5kRotNo1p5bWIJW, BuildID[sha1]=eb227841f2f9dcccdc1c2e61c257f307f8a39c47, stripped

  • hugo-aarch64

hugo: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, Go BuildID=WKoC5cPWKFCuD4_Wbl-d/Xk1LdOTak19kxpdVP_Cr/7zko5eJrtSPBTFZZmx1J/GCKiCIfXOUUZzTeSZBJC, stripped

I think alpine is missing something critical since hugo-extended is dynamically linked.

On a side note, existing images built for AArch64 won't work on it as the Dockerfile downloads the amd64 binary only (see https://github.com/kubernetes/contributor-site/blob/148e2934d70ecc89ab2a770407defb973dbab8d4/Dockerfile#L27)

EDIT: Welp, I found this issue gohugoio/hugo#10839 and the "solutions" don't work either, the solutions being adding the libstdc++ package and symlinking ln -s /lib/libc.so.6 /usr/lib/libresolv.so.2.

@sftim
Copy link
Contributor

sftim commented Jan 31, 2025

On a side note, existing images built for AArch64 won't work on it as the Dockerfile downloads the amd64 binary only (see https://github.com/kubernetes/contributor-site/blob/148e2934d70ecc89ab2a770407defb973dbab8d4/Dockerfile#L27)

That's a link to a different Git repository; did you mean to link there @SayakMukhopadhyay ?

@sftim
Copy link
Contributor

sftim commented Jan 31, 2025

If we can avoid NPM wanting to download Hugo, that'll help. But not a small fix.

@SayakMukhopadhyay
Copy link
Contributor

SayakMukhopadhyay commented Jan 31, 2025

On a side note, existing images built for AArch64 won't work on it as the Dockerfile downloads the amd64 binary only (see https://github.com/kubernetes/contributor-site/blob/148e2934d70ecc89ab2a770407defb973dbab8d4/Dockerfile#L27)

That's a link to a different Git repository; did you mean to link there @SayakMukhopadhyay ?

I did mean to point to the contrib site Dockerfile as thats the one I was testing the npm install on (since its smaller and faster). But now I see that the website uses Go as a base image and builds Hugo from source. Let me test as building might be the best option.

@SayakMukhopadhyay
Copy link
Contributor

SayakMukhopadhyay commented Jan 31, 2025

I did some further digging and have a few observations and potential solutions.

An important point to note is that hugo-extended is not in the package-lock.json. Thus running npm ci should not install hugo-extend and thus shouldn't trigger it's post-install script. But it does. The core issue seems to be from how npm handles git repos as dependencies. It does so, by checking out git repo and then running npm install on that repo. This npm install and not our npm ci is what triggers hugo-extend getting installed which in turn runs its postinstall script which is responsible for downloading the correct binary and attempting to run it.

To verify this, I tried installing the deps using yarn v1 and yarn installs the deps without any problems. Do note that I have also checked this with the the latest Yarn but using the latest Yarn is a bit involved. Thus, imo, the issue is with how npm resolves git dependencies.

I can see the following options to solve this:

  1. Switch to using Yarn instead of npm. This will lead to a choice between the latest yarn or the out of maintanance yarn v1. The latest Yarn only works with corepack which means that we will need to have node in the image, that can run corepack enable, which is not possible right now. Using the older yarn means that we won't get any security or bug fixes.
  2. Switch to using Docsy as a Hugo module instead. We will only use npm for a couple of dependencies (autoprefixer and postcss). In fact, Docsy prefers using Hugo as a module (see https://www.docsy.dev/docs/get-started/docsy-as-module/).

My personal preference is to go with Hugo Modules as that seems to be the recommendation by Hugo too (see https://gohugo.io/hugo-modules/use-modules/). Moreover, yarn does a lot of things very different from npm (for eg. yarn install won't create the node_modules folder by default and has be to configured to do so).

@sftim
Copy link
Contributor

sftim commented Jan 31, 2025

Let's do whatever's simpler. However, you might find Hugo Modules harder than you think @SayakMukhopadhyay - we already have a go.mod.

No objection to using yarn within a container image build.

@SayakMukhopadhyay
Copy link
Contributor

No objection to using yarn within a container image build.

Thing is, AFAIK, to use Yarn within the container, we need to move away from using npm to yarn for the project itself. Yarn uses a different lock file which needs to be committed. Moreover, this also entails having corepack installed and modifying the package.json to depend on yarn everywhere (see the Backstage project as an example, it uses Yarn and corepack, see https://github.com/backstage/backstage/blob/master/package.json#L139 and https://backstage.io/docs/getting-started/#prerequisites).

It can be done, but it's important to note that the change will not be container specific and result in the project needing yarn to get started in all environments.

@biswajeet0192
Copy link

@sftim I'm trying to build the image locally and run it on macOS ARM64

I face the following issue:

Image

@sftim
Copy link
Contributor

sftim commented Jan 31, 2025

@biswajeet0192 I believe you - but you didn't ask a question.

@SayakMukhopadhyay
Copy link
Contributor

we already have a go.mod.

Ok @sftim I didn't initially get what you meant by this but now I understand. Even go mod vendor doesn't work locally since quite a few modules are replaced with locally referred paths and some are versioned to non existant 0.0.0.

@Andygol
Copy link
Contributor

Andygol commented Jan 31, 2025

It seems that the issue of Kubernetes documentation maintenance in general needs to be discussed extensively by the SIG Docs 🤔 to develop a strategy for further action to improve the overall state of maintenance of the entire project.

@biswajeet0192
Copy link

@biswajeet0192 I believe you - but you didn't ask a question.

@sftim I have attached a reference for my error. And by performing the above steps mentioned by you to run the repo locally, issue doesn't got resolved.

So I posted this to ask regarding if there is another way to resolve it on macOS with ARM64 processor.

@sftim
Copy link
Contributor

sftim commented Jan 31, 2025

@biswajeet0192 GitHub issues are not really the best place to ask for advice, even about work-arounds. However, you could look at https://github.com/kubernetes/website/?tab=readme-ov-file#using-this-repository and follow the advice there for running locally without a container.

@biswajeet0192
Copy link

@biswajeet0192 GitHub issues are not really the best place to ask for advice, even about work-arounds. However, you could look at https://github.com/kubernetes/website/?tab=readme-ov-file#using-this-repository and follow the advice there for running locally without a container.

@sftim In my mac, it is make container-serve is not working due to absence of image, and after running the previous image from the above comments, the issue still persists.

@SayakMukhopadhyay
Copy link
Contributor

@sftim In my mac, it is make container-serve is not working due to absence of image, and after running the previous image from the above comments, the issue still persists.

Until this issue is fixed, you will need to build your image using container-image. Also the errors in the screenshot seems to stem from not running make module-check first. That is the most I can guess based on the info.

@chalin
Copy link
Contributor

chalin commented Jan 31, 2025

Let's do whatever's simpler.

@sftim - IMHO the simplest solution is to use the github:google/docsy#v0.6.x Docsy dependency as I had proposed earlier. I only agreed to use the semver approach because we thought that we had a solution, which apparently isn't a general a solution as we thought.

I've been down this path and explored the options, and the simplest is to keep using the tools we have now and change the Docsy dependency so that hugo-extended doesn't get install no matter how you fetch Docsy as an NPM package. /cc @nate-double-u

@sftim
Copy link
Contributor

sftim commented Jan 31, 2025

OK, sure - let's do it that way.

@SayakMukhopadhyay
Copy link
Contributor

@sftim - IMHO the simplest solution is to use the github:google/docsy#v0.6.0 Docsy dependency as I had proposed earlier. I only agreed to use the semver approach because we thought that we had a solution, which apparently isn't a general a solution as we thought.

I've been down this path and explored the options, and the simplest is to keep using the tools we have now and change the Docsy dependency so that hugo-extended doesn't get install no matter how you fetch Docsy as an NPM package. /cc @nate-double-u

@chalin Just to confirm, do you mean github:google/docsy#v0.6.0 or github:google/docsy#v0.6.x ? Cause github:google/docsy#v0.6.0 will install from the tag which still has hugo-extended as a devDependency whereas the github:google/docsy#v0.6.x installs from the branch which has hugo-extended under disabledDependencies.

@chalin
Copy link
Contributor

chalin commented Jan 31, 2025

Right, the v0.6.x branch 👍🏻 sorry for the confusion.

@SayakMukhopadhyay
Copy link
Contributor

Alright, I will raise a PR to do that and will do the same for the 0.7.2 upgrade.

But, since its a workaround, I have started a discussion over at npm hoping to get some thoughts on this.

@sftim
Copy link
Contributor

sftim commented Feb 2, 2025

/reopen

This is not yet fixed.

@sftim
Copy link
Contributor

sftim commented Feb 3, 2025

Let's hope this time's the charm.

@sftim
Copy link
Contributor

sftim commented Feb 3, 2025

Also see #49625

@sftim
Copy link
Contributor

sftim commented Feb 3, 2025

All good (amd64/Linux)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/docs Categorizes an issue or PR as relevant to SIG Docs. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
9 participants