-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setup Node action - Set arch for key value #61010
Conversation
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.
To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you ❤️
@@ -28,7 +28,7 @@ runs: | |||
uses: actions/cache@88522ab9f39a2ea568f7027eddc7d8d8bc9d59c8 # v3.3.1 | |||
with: | |||
path: '**/node_modules' | |||
key: node_modules-${{ runner.os }}-${{ steps.node-version.outputs.NODE_VERSION }}-${{ hashFiles('package-lock.json') }} | |||
key: node_modules-${{ runner.os }}-${{ runner.arch }}-${{ steps.node-version.outputs.NODE_VERSION }}-${{ hashFiles('package-lock.json') }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is runner.arch
and why is it needed now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runner.arch
is a variable part of the runner context that returns the current architecture of the machine where it is running. For this case X64
.
For this PR the key it returns: node_modules-macOS-X64-v20.12.2-ae23e9300c7a80d4e1aa1bfc1a0c6668f8039ccd9867669545e6617f430c4805
why is it needed now?
I saw the current image we use was updated last week I also believe the transition to M1+ machines might be related. I think it's best to have the architecture to avoid future issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @geriux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this, I can't find any specific mention of why we need to include the Get Node.js and npm version
or Cache node_modules
steps.
The first step in this workflow uses actions/setup-node, which has a cache
input that we're utilizing, which utilizes actions/cache under the hood. There are also inputs for architecture
supported by that action .
Looking at the log, this already restores the cache based on the runner's architecture. I'd prefer that we actually eliminate our custom caching, unless we can identify why it's included separately.
For some historical context, this was introduced in #45932. |
We could try that! I can't focus on working on that at the moment, but it'd be worth investigating if it's not needed anymore 👍 Do you foresee any issues with having the |
I appreciate wanting to fix tests quickly, but 35 minutes between PR creation with no communication in the WordPress Slack asking if anyone is spending the time reviewing and then merging is a bit fast and we likely have not arrived at the best solution. As a configured code owner of this part of the code base, it's pretty disappointing that a reasonable amount of time to actually investigate was not given or a courtesy ping didn't happen. I have some additional questions as well.
|
Also for future commits, if you could please review the requirements for authoring a merge commit as it relates to giving props for contributions to a given change. It's very important for giving accurate attribution to contributors, and a requirement for all maintainers. |
@desrosj My bad as well. Usually if tests start failing on PRs I err on the side of getting in a fix sooner to unblock people, and prefer to follow up with a better solution if needed. Happy to revert this if you prefer that, though it means tests on PRs will fail until there's a better fix. |
Sorry that I came across as overly critical or as if I were gate keeping. I was actively investigating and came back to a closed PR and got a bit frustrated in the moment. I do appreciate everyone trying to fix the problem, and I also agree fixing tests ASAP is preferred. But a few extra hours of failing tests to understand the root cause is a good tradeoff when possible to make sure we're not making something worse or painting over a deeper problem. One of the things that I like about Trac is there must be a ticket before a change can be made. That's not something that is enforced on GitHub. It creates a nice "home" for all the related discussion even after a merge. I also wish there was a way to indicate you were actively working on something that's not assigning in GitHub. I think that could folks looking to merge something know if another contributor is actively working on providing a review for something. As for the original issue, I have been digging a bit deeper and I'm still not exactly sure what caused the problem. I'm writing these notes out as I walk through the investigation process, so everything is in sequence for how I looked into and discovered it. I believe that it could somehow related to the image update that was published on the 18th, but I'm not so sure that it is architecture related. I checked the project's dependencies, and When new runner images are merged, they are deployed over the course of 2-3 days. So while there are periods of time where different workflow runs could run on different images, this usually is only impactful when So, changing the key fixed the problem. But if it's using the same architecture, what else is going on here? I think it's also unlikely that the runner image update on the 12th noted above is the culprit. There were successful builds for some time using that Nothing obvious stands out in the release details, just some minor version bumps for a small number of tools. I tried to find the actual workflow that was responsible for creating the cache being used. To do that, I looked at the cache management screen for the repository (it is privileged, so not everyone can view it). I found the cache key causing the problem and it gives the time it was originally created. I looked at all of the workflows around that time and could not find any that successfully wrote to the cache. There were several that hit the cache (so no attempt to write was made), and all others displayed an error trying to write to the cache key as if a different workflow had reserved the key for writing to. I was wondering if there was somehow a corrupted or bad cache, so I deleted the two related cache items and reran some of the old runs that failed, and they successfully created a new cache for that key and passed. I reran a good number more and they also passed. 🎉 So in summary: the data stored in the cache key was likely incomplete, causing packages to be missing and resulting in the failure. I'm assuming whichever run attempted to write to the key failed, and it caused the waterfall of test failures. So this PR once merged essentially flushed the cache, ensuring a freshly generated one was used going forward. I don't think we necessarily need to revert this PR. Including the architecture won't hurt, and could potentially prevent an issue once some workflows start using Looking back at #45932, the At this point, all of the cache keys have cycled anyway, and we need to perform a larger audit of our cache keys in general. Each repo is only allotted 10GB total, and we frequently sit at 20GB+. This results in fewer cache hits overall because older cache keys are evicted. I'd love to find ways to share fewer caches more broadly to increase our overall hit rate and speed things up. But that's something for another issue! |
No worries, I agree leaving some time is good, in this case it was done quicker because of failing tests on PRs, which blocks merges and leaves contributors guessing if something in their PR went wrong. I always think it’s easy to revert or change a fix like this sooner or later afterwards, I think that's fine. There’s no damage done? At the end of the day, you are a code owner and you see it pass by? There are plenty of fixes/enhancements I see pass by that I think could be fixed in a better way, and I often go in later if I would like it changed, I think that’s fine? (Unless it's big architectural changes of course.) As for issues for each GH PR, I’m not sure I agree, PRs are very similar to Tac tickets where commits are linked? There can be discussion, assignment, labels and so on. |
Thank you for the detailed information from your investigations @desrosj!
Sorry for that, I'll keep it in mind to always include it. 👍
Interesting! I wonder why that happened 🤔 the idea of adding the architecture to the cache key came from this PR I worked on a few months ago trying to use the newer MacOS 14 runners, but I kept encountering the
I like that idea!
That'd be great, or as you said in another comment if we could remove the custom caching and use what's built-in, even better. Let me know if there's something I can help with or if you'd prefer to remove the architecture from the key for now, happy to open a PR for that. Thanks again! |
What?
This PR updates the
key
used for the cache key for thenode_module
dependencies.Why?
Currently the React Native E2E tests are failing, they use the Mac OS runner that appears to have been updated recently.
How?
By including the architecture the runner image is running so it doesn't bring a stored cached that was built for a different architecture/environment.
Testing Instructions
CI checks should pass.
Testing Instructions for Keyboard
N/A
Screenshots or screencast
N/A