Infrastructure for Orka (2024 and beyond) #3686

UlisesGascon · 2024-04-19T17:58:22Z

UlisesGascon · 2024-04-19T18:34:26Z

Current Orka state

updated on April 19, 2024

SSH port	Node: macpro-4	Node: macpro-5	Node: macpro-6
8822	release-macos11-x64-1	empty	test-macos11-x64-1
8823	empty	empty	test-macos11-x64-2
8824	empty	test-macos1015-x64-2	test-macos1015-x64-1
8825	empty	empty	empty

UlisesGascon · 2024-04-19T18:53:23Z

Next Orka state

updated on April 22, 2024

Intel Nodes

SSH port	Node: macpro-4	Node: macpro-5	Node: macpro-6
8822	release-macos11-x64-1	test-macos13-x64-2	test-macos11-x64-1
8823	test-macos13-x64-1	release-macos13-x64-1	test-macos11-x64-2
8824	empty	test-macos1015-x64-2	test-macos1015-x64-1
8825	empty	empty	empty

ARM Nodes

We assume that ARM Nodes can handle only 2 VMs and not +4 as Intel in the past due license limitations. This needs to be confirmed with support AFAIK?

SSH port	Node: arm-1	Node: arm-2	Node: arm-3
8822	test-macos11-arm64-1	release-macos13-arm64-1	empty
8823	release-macos11-arm64-1	test-macos13-arm64-1	test-macos13-arm64-2

How Nearform machines are "relocated"?

release-nearform-macos11.0-arm64-1 -> release-orka-macos11-arm64-1
test-nearform-macos11.0-arm64-1 -> test-orka-macos11-arm64-1

targos · 2024-04-22T14:59:13Z

release-macos13-x64-2
release-macos13-arm64-2

I don't think it's necessary to have two identical release machines.

targos · 2024-04-22T15:00:36Z

test-nearform-macos11.0-arm64-1

Are these typos?

UlisesGascon · 2024-04-22T15:38:27Z

Great feedback @targos! I updated the tables

I don't think it's necessary to have two identical release machines.

We have space for redundancy, but let's remove them for now.

Are these typos?

I made a better reference for the "relocated" machines

targos · 2024-05-02T13:31:07Z

release-macos13-x64-2
release-macos13-arm64-2

I don't think it's necessary to have two identical release machines.

Actually, I think we should have one x64 and two arm64 machines, because there are two jobs that run on macos-arm64 during a release (osx11-release-pkg and osx11-arm64-release-tar).

ryanaslett · 2024-05-17T14:58:23Z

Some questions/thoughts/suggestions:

Requirements Question: Do we still need to support 10.15 and/or 11? from (https://github.com/nodejs/node/blob/main/BUILDING.md#supported-platforms) I see:

Node.js does not support a platform version if a vendor has expired support for it. In other words, Node.js does not support running on End-of-Life (EoL) platforms. This is true regardless of entries in the table below.

And the table lists MacOS 11>.

And that table may be outdated as it seems as though MacOS 11 was EOL as of November 2023 ?

ARM support in Orka:

We assume that ARM Nodes can handle only 2 VMs and not +4 as Intel in the past due license limitations. This needs to be confirmed with support AFAIK?

https://orkadocs.macstadium.com/docs/apple-arm-based-support confirms this:

IMPORTANT

You can deploy up to 2 VMs per Apple silicon-based node.

From what I can gather macOS infra seems to be brittle, with nodes often running into disk issues/maintenance issues.

#3592
#3685
(https://github.com/nodejs/build/issues?q=is%3Aissue+macos+is%3Aclosed+disk) etc.

My suggestion to avoid Jenkins worker decay is to lean into an ephemeral node strategy so that each build has a fresh Orka instance to run on.

We can do that with the following Jenkins plugin for Orka:
https://plugins.jenkins.io/macstadium-orka/#plugin-content-ephemeral-agents

We would first need to set up a packer build process to create our VM images so that Orka would have a baseline image to create:
https://orkadocs.macstadium.com/docs/packer

The packer process can leverage our existing ansible playbooks:
https://developer.hashicorp.com/packer/integrations/hashicorp/ansible/latest/components/provisioner/ansible.

This strategy would require that we have an Orka3.0 cluster. Rather than trying to do an upgrade of the existing cluster, I propose that we ask macstadium to allow us to provision a new cluster with the resources we need in it (enough arm/intel backing nodes for our macos11/13 testing and release), get it built/provisioned and working, and then decommission/return all the existing macstadium/orka machines.

I believe this would end up with us using roughly the same amount of resources, so should be palatable for macstadium to support this transition.

mhdawson · 2024-05-21T15:54:19Z

This strategy would require that we have an Orka3.0 cluster. Rather than trying to do an upgrade of the existing cluster, I propose that we ask macstadium to allow us to provision a new cluster with the resources we need in it (enough arm/intel backing nodes for our macos11/13 testing and release), get it built/provisioned and working, and then decommission/return all the existing macstadium/orka machines.

+1 from me if Macstadium will support that

We are in the process of updating macOS to version 13 in the Jenkins CI, but unfortunately this is taking longer than expected. Add it to the GitHub actions test matrix so that we have some coverage. Refs: nodejs/build#3686

We are in the process of updating macOS to version 13 in the Jenkins CI, but unfortunately this is taking longer than expected. Add it to the GitHub actions test matrix so that we have some coverage. Refs: nodejs/build#3686 PR-URL: #56307 Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>

targos · 2024-12-21T11:58:31Z

Some interesting news, coming from nodejs/node-v8#295 and a Slack chat with @joyeecheung:

V8 is currently built with Xcode 16.1 (macOS SDK 15.1), with a compatibility target of macOS 11.0 (source: https://source.chromium.org/chromium/chromium/src/+/main:build/config/mac/mac_sdk.gni)
We have the same compatibility target
In the release CI, we have two different Xcode versions (source: https://ci-release.nodejs.org/job/iojs+release/10687)
- osx13-x64-release-tar uses version 16.0 (clang-1600.0.26.3). Builds of Node.js with latest V8 are successful with it.
- osx13-arm64-release-tar uses version 14.3 (clang-1403.0.22.14.1). Builds of Node.js with latest V8 fail with it.

That said, I suggest:

We bump the official minimum version required to build Node.js to at least Xcode 16.1 (we could decide on a later version too).
We make sure the decided version is the one used in the CI jobs (release and test, x64 and arm64)

targos · 2024-12-21T12:04:39Z

Note that officially (according to https://developer.apple.com/download/applications/), Xcode 16.1 requires at least macOS 14.5 to run, and according to Wikipedia, Xcode 16.0 did too. So I don't know how the osx13-x64-release-tar job is able to run, but it may be risky not to upgrade macOS to a supported version.

joyeecheung · 2024-12-21T13:24:00Z

I left my machine that has macOS 13 + Apple Clang 14 now so can't provide more details until after the holidays but FWIW: when I tried to install the latest system update for 13, the only available update was upgrading to Sequoia, and nothing else showed up when I tried to look for last compatible update of XCode or command line tools with App Store or Software Update/softwareupdate --list. If somehow it is possible to run macOS 13 with XCode 16 we should likely need to document how to install it, or contributors on macOS 13 may have a hard time getting it to build (or if it just doesn't work then we need to tell contributors to upgrade to Sequoia).

targos · 2024-12-21T13:27:32Z

This is how we manually install Xcode on the build machines: https://github.com/nodejs/build/blob/main/ansible/MANUAL_STEPS.md#full-xcode

joyeecheung · 2024-12-21T13:41:06Z

Also my 2cents: V8 uses (almost) tip of tree clang, so that's currently clang 20, and they have been doing a lot of C++ modernization that lower versions of clang aren't very good at parsing. I did quite a few patching to make V8 build on macOS 13 and Clang 14 in https://github.com/joyeecheung/node/tree/fix-macos-13 and many of the fixes don't look very acceptable in the upstream because they basically just revert the modernization. If we are upgrading the build system the least friction route would probably be to just require Sequoia and XCode 16 to build, though we can keep targeting 11. The lower macOS version we need to support, the harder it is to install higher versions of Apple Clang on it, and the C++ feature gap will keep widening as V8 uses ToT Clang.

UlisesGascon · 2024-12-21T14:39:22Z

If somehow it is possible to run macOS 13 with XCode 16 we should likely need to document how to install it, or contributors on macOS 13 may have a hard time getting it to build (or if it just doesn't work then we need to tell contributors to upgrade to Sequoia)

For the new Orka machines, we are using Packer, and the instructions include some manual steps on how to install it that are replicable for local machines as well:
https://github.com/nodejs/build/tree/main/orka/templates#manual-steps-for-the-release-images.

We probably want to update the commands and ensure that we are using the correct version 👍

We are in the process of updating macOS to version 13 in the Jenkins CI, but unfortunately this is taking longer than expected. Add it to the GitHub actions test matrix so that we have some coverage. Refs: nodejs/build#3686 PR-URL: #56307 Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>

anonrig · 2025-01-12T02:35:19Z

Is there any update/progress on this issue?

UlisesGascon · 2025-01-20T20:34:02Z

Let me ping @ryanaslett! AFAIK we were testing the new ephemeral instances and waiting for a HW upgrade in the new cluster so we can decommission the old VMs and move all the workloads for both CI environments, but not sure if this was completed or not.

anonrig · 2025-01-25T00:58:45Z

This issue is currently the only blocker for adding URLPattern to Node.js - nodejs/node#56452

richardlau · 2025-01-27T17:43:46Z

Linking this to openjs-foundation/infrastructure#17.

ryanaslett · 2025-01-30T07:50:36Z

Update:

The images that back the test instances on our cluster are in a state where I believe we can unblock our blocked PR's.

If somehow it is possible to run macOS 13 with XCode 16 we should likely need to document how to install it, or contributors on macOS 13 may have a hard time getting it to build (or if it just doesn't work then we need to tell contributors to upgrade to Sequoia).

The instances have osx13.0, with XCode 16 installed on them (both on the Arm and Intel images). XCode16 was installed via the command line (despite apple's compatibility matrix) and it appears to be working fine for our purposes.

I agree that's likely not ideal from a contributor perspective -> to trick a Ventura machine into running Xcode 16, which works fine for compiling, but likely doesnt run at all in the GUI.

In the release CI, we have two different Xcode versions

That was an oversight where xcode16 didn't fully get installed on the Arm image before it was deployed. I have an updated release image that has xcode16 prepared, but am hesitant to deploy change that since today I also updated those release images to renew the expired signing certificates, and I wouldn't want to impact the current release schedule any further than it already has been . I can deploy that when the releases have actually happened.

The testing images can be enabled as soon as I get enough consensus that its acceptable to run tests on macos13 with xcode 16.

If we should decide those ought to be sequoia and xcode16, I can make images for those, but that'll likely require more time because we'll likely want all four images (test/release X arm/intel) to be updated.

ryanaslett · 2025-01-30T15:12:12Z

I noticed several attempts at re-running some PR's on osx.. I hadnt yet turned on the osx13 labels on the Jenkins jobs. I went ahead and did that so we can see the results.

targos · 2025-01-30T15:54:26Z

I see osx13-arm64 and osx13-x64 labels in both node-test-commit-osx and node-test-commit-osx-arm jobs.
This seems redundant. Should we remove node-test-commit-osx-arm from node-test-pull-request?

ryanaslett · 2025-01-30T18:00:25Z

I see osx13-arm64 and osx13-x64 labels in both node-test-commit-osx and node-test-commit-osx-arm jobs. This seems redundant. Should we remove node-test-commit-osx-arm from node-test-pull-request?

Agreed. I've configured node-test-commit-osx to run both osx13-arm64 and osx13-x64 labels, and have disabled the node-test-commit-osx-arm on node-test-pull-request. We can remove it entirely once all the dust settles.

ryanaslett · 2025-01-30T18:02:55Z

Separate related question to the OSX instances: Should these jobs be using ccache? We have an available shared storage that we can use so that when an ephemeral instance is launched, it can have an existing warmed ccache, but it doesnt look like the osx11/osx10.15 machines were using ccache either, so Im unsure if that's not applicable to builds on osx or not.

richardlau · 2025-01-30T18:52:18Z

They're supposed to be.

e.g. from today's https://ci.nodejs.org/view/Node.js%20Daily/job/node-daily-v18.x-staging/453/
https://ci.nodejs.org/job/node-test-commit-osx/63373/nodes=osx1015/consoleFull

10:01:27 + export CCACHE_BASEDIR=/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1015
10:01:27 + CCACHE_BASEDIR=/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1015
10:01:27 + export 'CC=/usr/local/bin/ccache cc'
10:01:27 + CC='/usr/local/bin/ccache cc'
10:01:27 + export 'CXX=/usr/local/bin/ccache c++'
10:01:27 + CXX='/usr/local/bin/ccache c++'
10:01:27 ++ getconf _NPROCESSORS_ONLN
10:01:27 + export JOBS=4
10:01:28 + JOBS=4
10:01:32 + NODE_TEST_DIR=/Users/iojs/node-tmp
10:01:32 + FLAKY_TESTS=dontcare
10:01:32 + make run-ci -j 4
10:01:32 python3 ./configure --verbose 
10:01:32 Node.js configure: Found Python 3.7.7...
10:01:32 Detected clang C++ compiler (CXX=/usr/local/bin/ccache c++) version: 11.0.3
10:01:32 Detected clang C compiler (CC=/usr/local/bin/ccache cc) version: 11.0.3
...

Admittedly the build time would suggest that either build times on macOS are much worse than other platforms or the ccache set up on macOS is broken.

ryanaslett · 2025-01-30T19:03:15Z

They're supposed to be.

Ah, yes. I see that. They were not using that on the tests I ran, so not sure what's different between the 'real jobs' and my smoke test setup.

I'm looking into why its not respecting the ccache env vars. I think I may need to use a .ccache config instead of env vars.

Also, we're only getting half the instances for our arm builds as they were set to 6 cpu's and those nodes only had 10 available. I've reset those to 5 but the queue will have to clear out before it shuts down the 6 cpu vm's and re-creates 2 5cpu vms.

mhdawson · 2025-01-30T19:05:58Z

If we should decide those ought to be sequoia and xcode16, I can make images for those, but that'll likely require more time because we'll likely want all four images (test/release X arm/intel) to be updated.

I think we should target moving to sequoia, and since we have depended on the flag to set the lowest target OS to run on, versus building on the lowest target OS for macOS successfully over the year I don't think we expect any surprises when we make the move.

In the interests of unblocking the CIs, I'm also ok running on osx13 as you have it set up until new images can be built and deployed.

targos · 2025-01-31T07:00:44Z

The builds generally look good, but yeah they definitely need ccache. At least for x64, which takes around 3 hours (vs 20 minutes for arm64!)

ryanaslett · 2025-01-31T09:23:36Z

I had set up the x64 machines to use the ccache earlier today, and when I checked the shared cache directory, it had created all of the hash prefix subdirs, so I assumed it was working.

But I just dug deeper with the ccache debug logs and turns out it created those dirs without group perms, but the processes can only write as group members.

I just did a chmod g+w on the whole cache dir and now all the files are starting to cache. Hopefully this speeds things up a bit.

There does still seem to be something awry with the x64 machines connecting to the shared drive, as doing simple things like an ls take 22 seconds (wheras on the arm machines it acts like a normal HD)

We are in the process of updating macOS to version 13 in the Jenkins CI, but unfortunately this is taking longer than expected. Add it to the GitHub actions test matrix so that we have some coverage. Refs: nodejs/build#3686 PR-URL: #56307 Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>

richardlau added the platform:osx label Apr 19, 2024

UlisesGascon added infra build-agenda platform:osx and removed platform:osx labels Apr 19, 2024

This was referenced Apr 19, 2024

Replacement for NearForm Mac OS machines #3638

Closed

Infrastructure for MacOS 13.x #3240

Closed

UlisesGascon added test ci-release labels Apr 19, 2024

UlisesGascon self-assigned this Apr 19, 2024

mhdawson mentioned this issue Apr 22, 2024

Node.js Build WorkGroup Meeting 2024-04-23 #3689

Closed

targos pinned this issue May 2, 2024

mhdawson mentioned this issue May 13, 2024

Node.js Build WorkGroup Meeting 2024-05-15 #3716

Closed

mhdawson mentioned this issue Jun 3, 2024

Node.js Build WorkGroup Meeting 2024-06-04 #3748

Closed

richardlau mentioned this issue Jun 23, 2024

Increase minimum macOS version nodejs/node#53561

Open

mhdawson mentioned this issue Jun 24, 2024

Node.js Build WorkGroup Meeting 2024-06-26 #3777

Closed

targos mentioned this issue Jul 1, 2024

build: set macos deployment target to 13.3 nodejs/node#53668

Closed

richardlau mentioned this issue Jul 5, 2024

Planning/requirements for Node.js 23 #3807

Closed

17 tasks

mhdawson mentioned this issue Jul 15, 2024

Node.js Build WorkGroup Meeting 2024-07-16 #3831

Closed

mhdawson mentioned this issue Aug 5, 2024

Node.js Build WorkGroup Meeting 2024-08-07 #3850

Closed

This comment has been minimized.

Sign in to view

UlisesGascon mentioned this issue Aug 9, 2024

Add Jenkins Plugin for Orka #3860

Closed

anonrig mentioned this issue Dec 11, 2024

Status of smartOS support and what future holds nodejs/TSC#1663

Closed

targos mentioned this issue Dec 18, 2024

build: test macos-13 on GitHub actions nodejs/node#56307

Merged

mhdawson mentioned this issue Dec 30, 2024

Node.js Build WorkGroup Meeting 2025-01-01 #3992

Open

mhdawson mentioned this issue Jan 20, 2025

Node.js Build WorkGroup Meeting 2025-01-22 #4003

Closed

UlisesGascon assigned ryanaslett and unassigned UlisesGascon Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infrastructure for Orka (2024 and beyond) #3686

Infrastructure for Orka (2024 and beyond) #3686

UlisesGascon commented Apr 19, 2024 •

edited by ryanaslett

Loading

UlisesGascon commented Apr 19, 2024

UlisesGascon commented Apr 19, 2024 •

edited

Loading

targos commented Apr 22, 2024 •

edited

Loading

targos commented Apr 22, 2024

UlisesGascon commented Apr 22, 2024 •

edited

Loading

targos commented May 2, 2024

ryanaslett commented May 17, 2024

mhdawson commented May 21, 2024

This comment has been minimized.

targos commented Dec 21, 2024 •

edited

Loading

targos commented Dec 21, 2024

joyeecheung commented Dec 21, 2024 •

edited

Loading

targos commented Dec 21, 2024

joyeecheung commented Dec 21, 2024 •

edited

Loading

UlisesGascon commented Dec 21, 2024

anonrig commented Jan 12, 2025

UlisesGascon commented Jan 20, 2025

anonrig commented Jan 25, 2025

richardlau commented Jan 27, 2025

ryanaslett commented Jan 30, 2025

ryanaslett commented Jan 30, 2025

targos commented Jan 30, 2025 •

edited

Loading

ryanaslett commented Jan 30, 2025

ryanaslett commented Jan 30, 2025

richardlau commented Jan 30, 2025

ryanaslett commented Jan 30, 2025

mhdawson commented Jan 30, 2025 •

edited

Loading

targos commented Jan 31, 2025

ryanaslett commented Jan 31, 2025

Infrastructure for Orka (2024 and beyond) #3686

Infrastructure for Orka (2024 and beyond) #3686

Comments

UlisesGascon commented Apr 19, 2024 • edited by ryanaslett Loading

Current tasks on MacOS infra

UlisesGascon commented Apr 19, 2024

Current Orka state

UlisesGascon commented Apr 19, 2024 • edited Loading

Next Orka state

targos commented Apr 22, 2024 • edited Loading

targos commented Apr 22, 2024

UlisesGascon commented Apr 22, 2024 • edited Loading

targos commented May 2, 2024

ryanaslett commented May 17, 2024

mhdawson commented May 21, 2024

This comment has been minimized.

targos commented Dec 21, 2024 • edited Loading

targos commented Dec 21, 2024

joyeecheung commented Dec 21, 2024 • edited Loading

targos commented Dec 21, 2024

joyeecheung commented Dec 21, 2024 • edited Loading

UlisesGascon commented Dec 21, 2024

anonrig commented Jan 12, 2025

UlisesGascon commented Jan 20, 2025

anonrig commented Jan 25, 2025

richardlau commented Jan 27, 2025

ryanaslett commented Jan 30, 2025

ryanaslett commented Jan 30, 2025

targos commented Jan 30, 2025 • edited Loading

ryanaslett commented Jan 30, 2025

ryanaslett commented Jan 30, 2025

richardlau commented Jan 30, 2025

ryanaslett commented Jan 30, 2025

mhdawson commented Jan 30, 2025 • edited Loading

targos commented Jan 31, 2025

ryanaslett commented Jan 31, 2025

UlisesGascon commented Apr 19, 2024 •

edited by ryanaslett

Loading

UlisesGascon commented Apr 19, 2024 •

edited

Loading

targos commented Apr 22, 2024 •

edited

Loading

UlisesGascon commented Apr 22, 2024 •

edited

Loading

targos commented Dec 21, 2024 •

edited

Loading

joyeecheung commented Dec 21, 2024 •

edited

Loading

joyeecheung commented Dec 21, 2024 •

edited

Loading

targos commented Jan 30, 2025 •

edited

Loading

mhdawson commented Jan 30, 2025 •

edited

Loading