Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix build caching related error LNK4020 #1553

Closed
DanRStevens opened this issue Feb 1, 2025 · 5 comments · Fixed by #1583 or #1603
Closed

Fix build caching related error LNK4020 #1553

DanRStevens opened this issue Feb 1, 2025 · 5 comments · Fixed by #1583 or #1603

Comments

@DanRStevens
Copy link
Member

Occasionally builds fail with:

error LNK4020: a type record in 'd:\a\ophd\ophd\nas2d-core.build\release_x86_nas2d\NAS2D.pdb' is corrupted; some symbols and types may not be accessible from the debugger [D:\a\OPHD\OPHD\appOPHD\appOPHD.vcxproj]

This seems to happen when the NAS2D submodule is updated. It's not clear what the exact cause is, though clearing cache entries and re-running the failed workflows can get them to pass. The caches are found at:

Using the filter feature, entries to clear are (note no space after :):

  • key:buildCache
  • key:nas2dCache
  • key:vcpkgCache

It's possible only a subset of those need to be cleared out.


I suspect this may also be related to new vcpkg packages. When the error shows up, there also tend to be newly built vcpkg packages.

Perhaps what we really need is a way to invalidate build caches when new vcpkg packages are available. This might be detected by a version change with:

vcpkg --version

Perhaps the vcpkg version string could become part of the cache key.


On a possibly related note, maybe the baseline value is of interest:


We might only care if dependencies we rely on are updated. That may be less often than the version of vcpkg is updated. Perhaps using the vcpkg --version value would cause more frequent cache invalidation than is actually necessary.

@DanRStevens
Copy link
Member Author

The first run of the PR to version lock vcpkg dependencies failed with the LNK4020 error. That clearly indicates the problem is still an issue. However, with version locking of vcpkg dependencies, I suspect the problem will be less common now.

I would fully expect to encounter this error again when updating the version lock for NAS2D, and then updating the NAS2D submodule. I'm less certain if updating the NAS2D submodule on it's own is likely to make the problem re-appear.


I think what we might want to do is cause a cascade failure of cache restores, so we only restore a cache when all prior caches have been successfully restored:
vcpkg -> nas2d -> ophd

@DanRStevens
Copy link
Member Author

Despite recent attempts to version lock by setting a baseline, there was still a recent automatic update to vcpkg dependencies.

After the update, the cache keys for the binary packages had updated hash strings. It's not clear if a new version of the packages were actually installed, or some change to the tooling caused updated hashes.

Doing a bit of digging, here are some details on package hashes:
microsoft/vcpkg#15075

vcpkg_abi_info.txt

After running vcpkg install, such files can be found with:

find vcpkg_installed/ -name 'vcpkg_abi_info.txt'

What's less clear, is if there is a way to calculate this information before installing packages. If we can get accurate hash info before installing packages, then it can become a reliable part of the cache key. To do that, we'd need a hash that depends solely on vcpkg.json, and what a local install of vcpkg would do with that file. There's an assumption here that the local install of vcpkg is at least as new as the baseline. Hopefully we wouldn't need to backdate any local version of vcpkg to an older commit to match the baseline.


Another idea, is to use a version identifier for either vcpkg itself, or the GitHub Actions runner image. Usually when the GitHub Actions runner image is updated, it includes an updated version of vcpkg. This often means new versions of the dependencies that we install. It maybe shouldn't always mean that, so perhaps this would cause more cache invalidation and rebuilds than necessary, though it may be close enough.

There is mention of env vars for the GitHub Actions image version here:
actions/runner-images#7879

$ImageOS and $ImageVersion

Though when attempting to use ImageVersion, it didn't appear to have any value set.

@DanRStevens
Copy link
Member Author

DanRStevens commented Feb 12, 2025

On the topic of version locking and baselines, a recent run showed vcpkg trying to install new packages:
https://github.com/OutpostUniverse/OPHD/actions/runs/13286055031/job/37094887787

This is despite having successfully restored a recent cache, and no changes to the tooling:
vcpkg package management program version 2025-01-29-a75ad067f470c19f030361064e32a2585392bee2

Cache restored from key: vcpkgCache-Windows-x64-00fc58e29020291fd374aafc403c8590fa625c3e181b59d3d363ff734f63cfdd

After that branch was merged, there was no attempt by vcpkg to update packages when running for main:
https://github.com/OutpostUniverse/OPHD/actions/runs/13286381735/job/37095918169

Again, same cache restore, and no changes to tooling:
vcpkg package management program version 2025-01-29-a75ad067f470c19f030361064e32a2585392bee2

Cache restored from key: vcpkgCache-Windows-x64-00fc58e29020291fd374aafc403c8590fa625c3e181b59d3d363ff734f63cfdd

Considering branch caches are not visible when building for main, there would have been no access to binary package caches for the second run on main.

The binary package caches showed updated hash values:
Image

Not really sure what caused vcpkg to try and build new packages on that branch.


Edit: Shortly after, vcpkg wanted to rebuild packages on main. Again, no change in the tooling.
https://github.com/OutpostUniverse/OPHD/actions/runs/13308934307/job/37166306674
vcpkg package management program version 2025-01-29-a75ad067f470c19f030361064e32a2585392bee2

Cache restored from key: vcpkgCache-Windows-x64-00fc58e29020291fd374aafc403c8590fa625c3e181b59d3d363ff734f63cfdd

@DanRStevens
Copy link
Member Author

DanRStevens commented Feb 14, 2025

Adding link to a recent run that showed more LNK4020 errors:
https://github.com/OutpostUniverse/OPHD/actions/runs/13331042983

The first attempt at the run failed, then caches were cleared for that branch, and each job was re-run again. The third attempt shows both runs completed after clearing the cache.

The caches restored for the first failed run were:

Cache restored from key: vcpkgCache-Windows-x64-00fc58e29020291fd374aafc403c8590fa625c3e181b59d3d363ff734f63cfdd
Cache restored from key: nas2dCache-Windows-x64-bc0abae348b5c0d4aa3552437e894670b9fff5ae
Cache restored from key: buildCache-Windows-x64-afcb1dfc8af8b52d4befccdee88f955ea5ccd346

This resulted in error LNK4020.

For the 3rd successful re-run of the job:

Cache restored from key: vcpkgCache-Windows-x64-00fc58e29020291fd374aafc403c8590fa625c3e181b59d3d363ff734f63cfdd
Cache restored from key: nas2dCache-Windows-x64-bc0abae348b5c0d4aa3552437e894670b9fff5ae
Cache restored from key: buildCache-Windows-x64-ba0fa9b5c28bfdc9a2fdfe281bd4d963def4bf6a

As can be seen, only the last cache for the OPHD incremental build cache was different. That cache came from an earlier branch build, before a forced push. For the 3rd re-run, all three caches would have come from main. The first two caches were identical to main.

There was an earlier successful build, before a forced push here:
https://github.com/OutpostUniverse/OPHD/actions/runs/13318018422/job/37196668458
This resulted in saving the cache:

Cache saved with key: buildCache-Windows-x64-afcb1dfc8af8b52d4befccdee88f955ea5ccd346

This appears to be the cache that was restored by the failed build above.


Edit: Noticed a possibly relevant finding:
For the run that created the cache, which later caused a failure when it was restored, this appeared earlier:

Cache not found for input keys: nas2dCache-Windows-x64-9258fbc0d95e993d18ec746955da124d018ff31a

It saved a new cache for NAS2D, but later the next run restored a different cache for NAS2D. It also restored a cache for OPHD based on the NAS2D cache from when the OPHD cache was generated, which didn't match up with the NAS2D cache that was restored when the OPHD cache was later restored. Thus the OPHD was using a cached incremental build from a different cached build of NAS2D.

Related, there was a submodule update that was merged into main recently. The rebased changes on the branch mentioned here were based on the updated main, whereas the previously built cache was based on an older version of main. Hence they depended on a different version of NAS2D. That could explain why the newer code was able to pick up a cache for NAS2D when the earlier run couldn't find a cache for NAS2D.

@DanRStevens
Copy link
Member Author

Another run failed with error LNK4020:
https://github.com/OutpostUniverse/OPHD/actions/runs/13381865733/job/37371681015

Of note was the following:

Cache restored from key: nas2d-Windows-x64-00fc58e2-cccfc324
...
Cache hit for: ophd-Windows-x64--00fc58e2-cccfc324-bc0b993e
...
Cache restored from key: ophd-Windows-x64--00fc58e2-bc0abae3-0a88fdd9

That cache key should not have been possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant