Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New behavior when max_client_disconnect is used and the worker node moves from disconnected to ready #15483

Closed
ron-savoia opened this issue Dec 6, 2022 · 9 comments · Fixed by #15808 or #16609
Assignees
Labels
hcc/cst Admin - internal stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/edge type/bug

Comments

@ron-savoia
Copy link
Contributor

ron-savoia commented Dec 6, 2022

Nomad version

1.3.8/1.4.3

Issue

A new behavior is seen in 1.3.8 & 1.4.3 (verified) where jobs which have max_client_disconnect defined and a prestart task is used to ensure the task only runs on the desired node, changes state from unknown, to running and then to complete once the node is back in a ready state. Eventually a new alloc will be placed on the desired node, after it has cycled through other non-desired nodes, however this is not the same behavior seen in prior versions tested (eg. 1.3.3/1.4.2)

Behavior seen when testing with 1.3.8/1.4.3:

After the alloc is placed on the desired node; if the nomad service is stopped on the desired node:

  • The node status is disconnected and the placed alloc status is unknown, as expected.
  • The alloc status stays as unknown for the duration of the node being disconnected.
  • Once the nomad service is restarted, the node status changes to ready
  • The alloc initially placed on the preferred node briefly shows a status of running but within ~25 seconds the status changes to complete and the job is moved to pending.
  • The job stays in pending until a new alloc is placed on the preferred node.

This behavior was only seen when a job uses a prestart task (raw_exec in this example) which ensures the job is placed on the specific/desired node before the main task starts.

Reproduction steps

The same steps, times and job files were used while testing against 1.3.x (1.3.8 & 1.3.3) and 1.4.x (1.4.3 & 1.4.2).

Jobs Used for Testing:

  1. example1.nomad - Generic redis job, max_client_disconnect is not set. Affinity used so the preferred node will be used for the initial placing and can be migrated to other nodes if needed.
  2. example2.nomad - Generic redis job, max_client_disconnect is set. Constraint used so the alloc will be pinned to the preferred node.
  3. 91244_mod-2.nomad - Modified redis job, max_client_disconnect is set and a pre-start task is used to ensure that the job is placed on the desired node before the main task starts.

High Level Steps:

  1. Deploy the jobs
  2. Once the allocs are verified running on the specific/desired node, stop the nomad service on the specific/desired node.
  3. Run nomad node status and nomad job status -verbose <JOB_ID> in intervals. I used 5 minute intervals, after the nomad service was stopped, for the baseline status. At the 15 minute mark, I started the nomad service on the specific/desired node and again ran nomad node status and nomad job status -verbose <JOB_ID> for example2.nomad and 91244_mod-2.nomad.
  4. Monitor the job status of 91244_mod-2.nomad

Expected Result

In versions prior to 1.3.8/1.4.3, the alloc status of 91244_mod-2.nomad behavior is:

  1. State after initial placement (Initial alloc placed on the preferred node): running
  2. State after the nomad service is stopped on the preferred node (Initial alloc placed on the preferred node): unknown
    State of allocs placed on non-preferred node, after the nomad service is stopped on the preferred node: failed
  3. State after the nomad service is started again on the preferred node (Initial alloc placed on the preferred node): running

5 Minute - Alloc Status

Allocations
ID                                    Eval ID                               Node ID                               Node Name         Task Group         Version  Desired  Status   Created                    Modified
636d015e-c81d-2371-646b-5ae756768013  95c4cfde-0704-0adf-f8a0-d7c7e01932fb  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        run      failed   2022-12-05T15:32:42-05:00  2022-12-05T15:32:58-05:00
16394df7-52bd-97f5-34c0-feb610c30546  2458303d-0fbc-012a-1e44-b2cbfa12fba7  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed   2022-12-05T15:30:26-05:00  2022-12-05T15:32:42-05:00
f96e2ace-29aa-0d6d-9564-06c78f92c573  dba3f5b2-9eec-330b-f407-ca476b7bd979  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed   2022-12-05T15:29:10-05:00  2022-12-05T15:30:26-05:00
556ce590-0158-29d5-b19c-f545b463ec25  34dda7d8-4fd1-162d-c8a7-84c2f89fff4e  4173785e-c421-3287-51a6-bd35b8cd601b  ip-172-31-20-215  91244_mod_group-2  0        run      unknown  2022-12-05T15:28:08-05:00  2022-12-05T15:29:10-05:00

10 Minute - Alloc Status

Allocations
ID                                    Eval ID                               Node ID                               Node Name         Task Group         Version  Desired  Status   Created                    Modified
25649dde-3ce4-4fdf-cc2a-658fc933d988  03f60b1e-fdff-6648-a107-0af72a9d0527  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        run      failed   2022-12-05T15:36:58-05:00  2022-12-05T15:37:16-05:00
636d015e-c81d-2371-646b-5ae756768013  95c4cfde-0704-0adf-f8a0-d7c7e01932fb  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed   2022-12-05T15:32:42-05:00  2022-12-05T15:36:58-05:00
16394df7-52bd-97f5-34c0-feb610c30546  2458303d-0fbc-012a-1e44-b2cbfa12fba7  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed   2022-12-05T15:30:26-05:00  2022-12-05T15:32:42-05:00
f96e2ace-29aa-0d6d-9564-06c78f92c573  dba3f5b2-9eec-330b-f407-ca476b7bd979  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed   2022-12-05T15:29:10-05:00  2022-12-05T15:30:26-05:00
556ce590-0158-29d5-b19c-f545b463ec25  34dda7d8-4fd1-162d-c8a7-84c2f89fff4e  4173785e-c421-3287-51a6-bd35b8cd601b  ip-172-31-20-215  91244_mod_group-2  0        run      unknown  2022-12-05T15:28:08-05:00  2022-12-05T15:29:10-05:00

15 Minute - Alloc Status (after nomad service started on the worker node)

Allocations
ID                                    Eval ID                               Node ID                               Node Name         Task Group         Version  Desired  Status   Created                    Modified
25649dde-3ce4-4fdf-cc2a-658fc933d988  03f60b1e-fdff-6648-a107-0af72a9d0527  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        run      failed   2022-12-05T15:36:58-05:00  2022-12-05T15:37:16-05:00
636d015e-c81d-2371-646b-5ae756768013  95c4cfde-0704-0adf-f8a0-d7c7e01932fb  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed   2022-12-05T15:32:42-05:00  2022-12-05T15:36:58-05:00
16394df7-52bd-97f5-34c0-feb610c30546  2458303d-0fbc-012a-1e44-b2cbfa12fba7  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed   2022-12-05T15:30:26-05:00  2022-12-05T15:32:42-05:00
f96e2ace-29aa-0d6d-9564-06c78f92c573  dba3f5b2-9eec-330b-f407-ca476b7bd979  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed   2022-12-05T15:29:10-05:00  2022-12-05T15:30:26-05:00
556ce590-0158-29d5-b19c-f545b463ec25  34dda7d8-4fd1-162d-c8a7-84c2f89fff4e  4173785e-c421-3287-51a6-bd35b8cd601b  ip-172-31-20-215  91244_mod_group-2  0        run      running  2022-12-05T15:28:08-05:00  2022-12-05T15:43:58-05:00

Actual Result

In versions 1.3.8/1.4.3, the alloc status of 91244_mod-2.nomad behavior is:

  1. State after initial placement (Initial alloc placed on the preferred node): running
  2. State after the nomad service is stopped on the preferred node (Initial alloc placed on the preferred node): unknown
    State of allocs placed on non-preferred node, after the nomad service is stopped on the preferred node: failed
  3. State after the nomad service is started again on the preferred node (Initial alloc placed on the preferred node): Initially the alloc status is running, then moves to complete

5 Minute - Alloc Status

Allocations
ID                                    Eval ID                               Node ID                               Node Name         Task Group         Version  Desired  Status   Created                    Modified
645c0ffe-96dd-c92e-d648-60cdbd86a52c  c5f13eda-1c5e-4032-b53c-c86f0bb6aa9b  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        run      failed   2022-12-05T13:26:49-05:00  2022-12-05T13:27:24-05:00
59824c2f-cad5-1e78-67d8-26efb53970b8  d79bcd13-e6e9-3cde-1d99-942346299ead  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed   2022-12-05T13:24:16-05:00  2022-12-05T13:26:49-05:00
49607b18-8cee-2916-8cb5-03d17bea49e7  4c057e43-3690-14cb-6b4c-e87bbd6782c2  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed   2022-12-05T13:22:42-05:00  2022-12-05T13:24:16-05:00
0e1ac64e-b641-c1a7-6444-afa52cccd2aa  3a105768-3092-1d49-34b8-98697f999413  4173785e-c421-3287-51a6-bd35b8cd601b  ip-172-31-20-215  91244_mod_group-2  0        run      unknown  2022-12-05T13:21:00-05:00  2022-12-05T13:22:42-05:00

10 Minute - Alloc Status

Allocations
ID                                    Eval ID                               Node ID                               Node Name         Task Group         Version  Desired  Status   Created                    Modified
81bc6e7b-d290-74d4-0494-da7309a74a6d  fc4f3c31-76b3-aeca-5ce3-fc2b413543d4  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        run      failed   2022-12-05T13:31:24-05:00  2022-12-05T13:32:00-05:00
645c0ffe-96dd-c92e-d648-60cdbd86a52c  c5f13eda-1c5e-4032-b53c-c86f0bb6aa9b  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed   2022-12-05T13:26:49-05:00  2022-12-05T13:31:24-05:00
59824c2f-cad5-1e78-67d8-26efb53970b8  d79bcd13-e6e9-3cde-1d99-942346299ead  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed   2022-12-05T13:24:16-05:00  2022-12-05T13:26:49-05:00
49607b18-8cee-2916-8cb5-03d17bea49e7  4c057e43-3690-14cb-6b4c-e87bbd6782c2  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed   2022-12-05T13:22:42-05:00  2022-12-05T13:24:16-05:00
0e1ac64e-b641-c1a7-6444-afa52cccd2aa  3a105768-3092-1d49-34b8-98697f999413  4173785e-c421-3287-51a6-bd35b8cd601b  ip-172-31-20-215  91244_mod_group-2  0        run      unknown  2022-12-05T13:21:00-05:00  2022-12-05T13:22:42-05:00

15 Minute - Alloc Status (after nomad service started on the worker node)

Allocations
ID                                    Eval ID                               Node ID                               Node Name         Task Group         Version  Desired  Status   Created                    Modified
81bc6e7b-d290-74d4-0494-da7309a74a6d  fc4f3c31-76b3-aeca-5ce3-fc2b413543d4  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        run      failed   2022-12-05T13:31:24-05:00  2022-12-05T13:32:00-05:00
645c0ffe-96dd-c92e-d648-60cdbd86a52c  c5f13eda-1c5e-4032-b53c-c86f0bb6aa9b  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed   2022-12-05T13:26:49-05:00  2022-12-05T13:31:24-05:00
59824c2f-cad5-1e78-67d8-26efb53970b8  d79bcd13-e6e9-3cde-1d99-942346299ead  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed   2022-12-05T13:24:16-05:00  2022-12-05T13:26:49-05:00
49607b18-8cee-2916-8cb5-03d17bea49e7  4c057e43-3690-14cb-6b4c-e87bbd6782c2  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed   2022-12-05T13:22:42-05:00  2022-12-05T13:24:16-05:00
0e1ac64e-b641-c1a7-6444-afa52cccd2aa  3a105768-3092-1d49-34b8-98697f999413  4173785e-c421-3287-51a6-bd35b8cd601b  ip-172-31-20-215  91244_mod_group-2  0        run      running  2022-12-05T13:21:00-05:00  2022-12-05T13:37:20-05:00

Allocations
ID                                    Eval ID                               Node ID                               Node Name         Task Group         Version  Desired  Status   Created                    Modified
47d9944f-0c74-3a74-aa82-ec4e1d5758f1  071d33fa-8f71-0c21-6a2b-e06c5ea0e1d4  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        run      pending  2022-12-05T13:39:59-05:00  2022-12-05T13:40:00-05:00
81bc6e7b-d290-74d4-0494-da7309a74a6d  fc4f3c31-76b3-aeca-5ce3-fc2b413543d4  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed   2022-12-05T13:31:24-05:00  2022-12-05T13:39:59-05:00
645c0ffe-96dd-c92e-d648-60cdbd86a52c  c5f13eda-1c5e-4032-b53c-c86f0bb6aa9b  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed   2022-12-05T13:26:49-05:00  2022-12-05T13:31:24-05:00
59824c2f-cad5-1e78-67d8-26efb53970b8  d79bcd13-e6e9-3cde-1d99-942346299ead  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed   2022-12-05T13:24:16-05:00  2022-12-05T13:26:49-05:00
49607b18-8cee-2916-8cb5-03d17bea49e7  4c057e43-3690-14cb-6b4c-e87bbd6782c2  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed   2022-12-05T13:22:42-05:00  2022-12-05T13:24:16-05:00
0e1ac64e-b641-c1a7-6444-afa52cccd2aa  3a105768-3092-1d49-34b8-98697f999413  4173785e-c421-3287-51a6-bd35b8cd601b  ip-172-31-20-215  91244_mod_group-2  0        run      running  2022-12-05T13:21:00-05:00  2022-12-05T13:37:20-05:00

Allocations
ID                                    Eval ID                               Node ID                               Node Name         Task Group         Version  Desired  Status    Created                    Modified
47d9944f-0c74-3a74-aa82-ec4e1d5758f1  071d33fa-8f71-0c21-6a2b-e06c5ea0e1d4  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        run      failed    2022-12-05T13:39:59-05:00  2022-12-05T13:40:34-05:00
81bc6e7b-d290-74d4-0494-da7309a74a6d  fc4f3c31-76b3-aeca-5ce3-fc2b413543d4  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed    2022-12-05T13:31:24-05:00  2022-12-05T13:39:59-05:00
645c0ffe-96dd-c92e-d648-60cdbd86a52c  c5f13eda-1c5e-4032-b53c-c86f0bb6aa9b  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed    2022-12-05T13:26:49-05:00  2022-12-05T13:31:24-05:00
59824c2f-cad5-1e78-67d8-26efb53970b8  d79bcd13-e6e9-3cde-1d99-942346299ead  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed    2022-12-05T13:24:16-05:00  2022-12-05T13:26:49-05:00
49607b18-8cee-2916-8cb5-03d17bea49e7  4c057e43-3690-14cb-6b4c-e87bbd6782c2  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed    2022-12-05T13:22:42-05:00  2022-12-05T13:24:16-05:00
0e1ac64e-b641-c1a7-6444-afa52cccd2aa  3a105768-3092-1d49-34b8-98697f999413  4173785e-c421-3287-51a6-bd35b8cd601b  ip-172-31-20-215  91244_mod_group-2  0        stop     complete  2022-12-05T13:21:00-05:00  2022-12-05T13:40:35-05:00

Allocations
ID                                    Eval ID                               Node ID                               Node Name         Task Group         Version  Desired  Status    Created                    Modified
6ae99016-53a7-b53e-a721-1b50051ec033  688a71e1-66f1-f80b-726d-1c395b4cb8ca  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        run      pending   2022-12-05T13:56:33-05:00  2022-12-05T13:56:52-05:00
47d9944f-0c74-3a74-aa82-ec4e1d5758f1  071d33fa-8f71-0c21-6a2b-e06c5ea0e1d4  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed    2022-12-05T13:39:59-05:00  2022-12-05T13:56:33-05:00
81bc6e7b-d290-74d4-0494-da7309a74a6d  fc4f3c31-76b3-aeca-5ce3-fc2b413543d4  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed    2022-12-05T13:31:24-05:00  2022-12-05T13:39:59-05:00
645c0ffe-96dd-c92e-d648-60cdbd86a52c  c5f13eda-1c5e-4032-b53c-c86f0bb6aa9b  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed    2022-12-05T13:26:49-05:00  2022-12-05T13:31:24-05:00
59824c2f-cad5-1e78-67d8-26efb53970b8  d79bcd13-e6e9-3cde-1d99-942346299ead  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed    2022-12-05T13:24:16-05:00  2022-12-05T13:26:49-05:00
49607b18-8cee-2916-8cb5-03d17bea49e7  4c057e43-3690-14cb-6b4c-e87bbd6782c2  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed    2022-12-05T13:22:42-05:00  2022-12-05T13:24:16-05:00
0e1ac64e-b641-c1a7-6444-afa52cccd2aa  3a105768-3092-1d49-34b8-98697f999413  4173785e-c421-3287-51a6-bd35b8cd601b  ip-172-31-20-215  91244_mod_group-2  0        stop     complete  2022-12-05T13:21:00-05:00  2022-12-05T13:40:35-05:00

Allocations
ID                                    Eval ID                               Node ID                               Node Name         Task Group         Version  Desired  Status    Created                    Modified
6ae99016-53a7-b53e-a721-1b50051ec033  688a71e1-66f1-f80b-726d-1c395b4cb8ca  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        run      failed    2022-12-05T13:56:33-05:00  2022-12-05T13:57:10-05:00
47d9944f-0c74-3a74-aa82-ec4e1d5758f1  071d33fa-8f71-0c21-6a2b-e06c5ea0e1d4  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed    2022-12-05T13:39:59-05:00  2022-12-05T13:56:33-05:00
81bc6e7b-d290-74d4-0494-da7309a74a6d  fc4f3c31-76b3-aeca-5ce3-fc2b413543d4  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed    2022-12-05T13:31:24-05:00  2022-12-05T13:39:59-05:00
645c0ffe-96dd-c92e-d648-60cdbd86a52c  c5f13eda-1c5e-4032-b53c-c86f0bb6aa9b  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed    2022-12-05T13:26:49-05:00  2022-12-05T13:31:24-05:00
59824c2f-cad5-1e78-67d8-26efb53970b8  d79bcd13-e6e9-3cde-1d99-942346299ead  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed    2022-12-05T13:24:16-05:00  2022-12-05T13:26:49-05:00
49607b18-8cee-2916-8cb5-03d17bea49e7  4c057e43-3690-14cb-6b4c-e87bbd6782c2  a21d8f3e-46ba-abd3-6ff7-967ad68996de  ip-172-31-29-24   91244_mod_group-2  0        stop     failed    2022-12-05T13:22:42-05:00  2022-12-05T13:24:16-05:00
0e1ac64e-b641-c1a7-6444-afa52cccd2aa  3a105768-3092-1d49-34b8-98697f999413  4173785e-c421-3287-51a6-bd35b8cd601b  ip-172-31-20-215  91244_mod_group-2  0        stop     complete  2022-12-05T13:21:00-05:00  2022-12-05T13:40:35-05:00

Job file (if appropriate)

example1.nomad

job "example_no_disconnect" {
  datacenters = ["dc1"]

  affinity {
    attribute = "${unique.hostname}"
    value = "<HOSTNAME_OF_PREFERRED_NODE>"
    weight = 100
  }

  group "cache_no_disconnect" {
    network {
      port "db" {
        to = 6379
      }
    }

    count = 2

    task "redis_no_disconnect" {
      driver = "docker"

      config {
        image          = "redis:7"
        ports          = ["db"]
        auth_soft_fail = true
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

example2.nomad

job "example_disconnect" {
  datacenters = ["dc1"]

  constraint {
    attribute = "${attr.unique.hostname}"
    value = "<HOSTNAME_OF_PREFERRED_NODE>"
  }

  group "cache_disconnect" {
    network {
      port "db" {
        to = 6379
      }
    }

    count = 2

    max_client_disconnect = "12h"

    task "redis_disconnect" {
      driver = "docker"

      config {
        image          = "redis:7"
        ports          = ["db"]
        auth_soft_fail = true
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

91244_mod-2.nomad

job "91244_mod-2" {
  datacenters = ["dc1"]

  group "91244_mod_group-2" {
    network {
      port "db" {
        to = 6379
      }
    }

    max_client_disconnect = "12h"

    task "91244_mod_pretask-2" {
      driver = "raw_exec"

      lifecycle {
        hook = "prestart"
        sidecar = false
      }

      template {
        data = <<EOH
        #!/bin/bash
        set -e
        echo "Starting maintenance pretask"
        echo $HOSTNAME
        if [[ "$HOSTNAME" = "<HOSTNAME_OF_PREFERRED_NODE>" ]]
        then
        echo "We're good to go, starting redis..." && exit 0
        else
        echo "Wrong host - exiting" && exit 1
        fi

        EOH
        perms = "775"
        destination = "maintenance.sh"
      }

      config {
        command = "/bin/bash"
        args    = ["-c","./maintenance.sh"]
      }

      resources {
        memory = 50
      }
    }

    task "91244_mod_redis-2" {
      driver = "docker"

      config {
        image          = "redis:7"
        ports          = ["db"]
        auth_soft_fail = true
      }

      resources {
        cpu = 500
        memory = 256
      }
    }
  }
}

Nomad Server logs (if appropriate)

Debug bundles taken while testing (1.3.8/1.3.3 & 1.4.3/1.4.2) have been uploaded to: https://drive.google.com/drive/folders/1Ds83JQBQlPQukPELj3Ia_iCUnaaE73Jp?usp=sharing

Nomad Client logs (if appropriate)

@ron-savoia ron-savoia changed the title New behavior in 1.3.8 when max_client_disconnect is used and the worker node is disconnected New behavior in 1.3.8 when max_client_disconnect is used and the worker node moves from disconnected to ready Dec 6, 2022
@ron-savoia ron-savoia changed the title New behavior in 1.3.8 when max_client_disconnect is used and the worker node moves from disconnected to ready New behavior when max_client_disconnect is used and the worker node moves from disconnected to ready Dec 6, 2022
@rikislaw
Copy link

rikislaw commented Dec 9, 2022

Hi,

I will also rephrase the problem in shorter form:

  1. We have unknown allocation on disconnected node (because max_client_disconnect is set)
  2. Nomad is trying to schedule missing one alloc on different node (according to reschedule stanza)
  3. Lets assume, that from any reason new allocs are failing on different nodes
  4. Disconnected Node becomes ready, allocation changes its state from unknown to running and second later to complete!!
  5. We have no running allocations!!!!
  6. Nomad Job is in pending state and finally will start missing allocation, but depending on the reschedule configuration it can take some time

We have BUG in point number 4. Allocation shouldn't stop.

BR,
Piotr Richter

@mikenomitch
Copy link
Contributor

Hey @rikislaw, thanks for the report. We haven't been able to start on this fix yet due to some unrelated issues popping up this last week, but it is pretty high in our queue. Should be picked up soon. Apologies for the regression.

@lgfa29
Copy link
Contributor

lgfa29 commented Jan 10, 2023

Hi everyone, thanks for the report and detailed information.

I'm still investigating the issue, but so far I believe that the prestart task does not affect the problem as I was able to reproduce the problem with a job that only has a single task, and this may not be a regression, as I've seen it happen in 1.4.2 after a few tries.

My current guess is that there's a race condition when the node reconnects and, depending on the order of events, the scheduler will make the wrong decision about the state of the allocation.

The changes made in 1.4.3 (more specifically in #15068) introduced stronger ordering for events, which may have cause this specific race condition to happen more frequently.

I will keep investigating the problem and I will post more updates as I figure things out.

@lgfa29
Copy link
Contributor

lgfa29 commented Jan 12, 2023

I understand what the problem is now, and have confirmed that:

  • This is indeed a race condition.
  • This is not a regression, the problem can happen in versions of Nomad prior to 1.4.3 as well.
  • 1.4.3 made the condition much more likely to happen due to the strict ordering of two update events (more on this below).
  • The issue is not related to task lifecycle.

The root cause of the problem is that Nomad clients make two different RPC calls to update its status with Nomad:

  • Node.UpdateStatus updates the client status (ready, down, initializing etc.)
  • Node.UpdateAlloc updates the status of the allocations assigned to the client (running, failed etc.)

These calls are made in parallel, and so can reach the server in any order. This can cause a series of problem for the scheduler because it needs both information (client status and alloc status) in order to make proper scheduling decisions, but if a scheduling operation happens between the two calls the scheduler has incomplete data to work with.

#15068 made it so Node.UpdateAlloc is a no-op until a client is ready, effectively enforcing a specific order for these events: first a client needs to call Node.UpdateStatus to set its status to ready and only then it can update its allocations with Node.UpdateAlloc.

This fixed the problem described in (#12680) where, upon reconnecting, a client could call Node.UpdateAlloc before a Node.UpdateStatus, and so allocations were updated to be marked as running while the client is still thought to be disconnected (allocations running in disconnected clients are considered to be unknown, so they were being incorrectly migrated).

The issue described here is a similar problem, but one where the Node.UpdateStatus call happens before Node.UpdateAlloc, with a pending eval waiting on new resources.

The order of events for this problem to happen is as follow:

  1. A client disconnects and its status is updated to disconnected.
  2. The allocation running in that client is set to unknown.
  3. Nomad creates a follow-up eval to create a replacement.
  4. The replacement cannot be scheduled anywhere else. This is the important bit in this case, and could have happen due to allocs failing in other nodes (as mentioned by @rikislaw), or lack of clients available for scheduling (like a specific constraint as mentioned by @ron-savoia). For my tests I used a single Nomad client, so when the client went down there was no other place to run the new allocation.
  5. The failed placement creates a blocked eval.
  6. The client comes back up and makes a Node.UpdateStatus RPC call.
  7. Nomad marks the node as ready.
  8. The blocked eval gets unblocked and so the replacement alloc is created in the same client that just came back up.
  9. The client makes the second status update call, Node.UpdateAlloc, and updates the previous alloc status to running.
  10. The scheduler detects that two allocations are running where only one should be, so it stops one of them.

If Node.UpdateAlloc happens before Node.UpdateStatus this problem doesn't happen because the existing allocation is already marked as running and so the blocked eval will be a no-op. Since this order can never happen in Nomad 1.4.3, this will always be an issue then.

I have prototyped a fix and I should have a PR available soon.

@margiran
Copy link

Hi,
I have followed the reproducing steps mentioned in the issue with the newly released version.
Testing the sample job with a prestart task (91244_mod-2.nomad) in both Nomad version 1.3.10 and 1.5.0, the issue was still observed.

Reproducing steps (I used the same job spec ):

  • Deploy 3 jobs
  • Start Monitoring the jobs nomad job status -verbose <JOB_ID> in intervals.
  • Once the allocs are verified running on the desired node, run systemctl stop nomad in the client, in my case nomad-client1( for the job 91244_mod-2.nomad ).
  • Waiting 5 minutes.
  • Starting the nomad service in the client systemctl start nomad

The job with max_client_disconnect and constraint stanza example_disconnect everything was as expected.
As for the job with a prestart task 91244_mod-2.nomad when the Disconnected Node becomes ready, The allocation that initially placed on the preferred node briefly shows a status of running but within a few seconds the status changes to complete, and the job is moved to pending.

job 91244--mod-2 (Nomad version 1.3.10) :
Alloc Status (before stop nomad service )

Allocations
ID                                    Eval ID                               Node ID                               Node Name      Task Group         Version  Desired  Status   Created                    Modified
14afd7f1-6e43-48a2-0f23-115945cb13d3  b48eff2a-3902-3fa2-510e-61f393e23b14  221c6f40-143a-bb76-0916-864a63a602b0  nomad-client1  91244_mod_group-2  0        run      running  2023-03-09T16:32:53+01:00  2023-03-09T16:32:54+01:00

Alloc Status (nomad client disconnected )

Allocations
ID                                    Eval ID                               Node ID                               Node Name      Task Group         Version  Desired  Status   Created                    Modified
0b45abfb-9031-bc4c-57df-172edbe8f5ac  ff95bfd5-1038-420c-b558-d2d6146c695e  e90a4135-7b3d-de39-3aa8-c2b9be44805b  nomad-client2  91244_mod_group-2  0        run      pending  2023-03-09T16:33:54+01:00  2023-03-09T16:33:55+01:00
14afd7f1-6e43-48a2-0f23-115945cb13d3  b48eff2a-3902-3fa2-510e-61f393e23b14  221c6f40-143a-bb76-0916-864a63a602b0  nomad-client1  91244_mod_group-2  0        run      unknown  2023-03-09T16:32:53+01:00  2023-03-09T16:33:54+01:00

Alloc Status (before start nomad service )

Allocations
ID                                    Eval ID                               Node ID                               Node Name      Task Group         Version  Desired  Status   Created                    Modified
e35fba74-6dd5-23d2-8dbc-011d9ffb2847  054b9d92-6cb6-3e30-26d8-ea89a1138726  e90a4135-7b3d-de39-3aa8-c2b9be44805b  nomad-client2  91244_mod_group-2  0        run      failed   2023-03-09T16:38:02+01:00  2023-03-09T16:38:35+01:00
8ab14f74-84e8-e6b8-ff12-4364fd422530  8717bcb2-8740-875a-22fd-3f3d3fbe8349  e90a4135-7b3d-de39-3aa8-c2b9be44805b  nomad-client2  91244_mod_group-2  0        stop     failed   2023-03-09T16:35:25+01:00  2023-03-09T16:38:02+01:00
0b45abfb-9031-bc4c-57df-172edbe8f5ac  ff95bfd5-1038-420c-b558-d2d6146c695e  e90a4135-7b3d-de39-3aa8-c2b9be44805b  nomad-client2  91244_mod_group-2  0        stop     failed   2023-03-09T16:33:54+01:00  2023-03-09T16:35:25+01:00
14afd7f1-6e43-48a2-0f23-115945cb13d3  b48eff2a-3902-3fa2-510e-61f393e23b14  221c6f40-143a-bb76-0916-864a63a602b0  nomad-client1  91244_mod_group-2  0        run      unknown  2023-03-09T16:32:53+01:00  2023-03-09T16:33:54+01:00

Alloc Status (nomad client is ready again)

Allocations
ID                                    Eval ID                               Node ID                               Node Name      Task Group         Version  Desired  Status   Created                    Modified
e35fba74-6dd5-23d2-8dbc-011d9ffb2847  054b9d92-6cb6-3e30-26d8-ea89a1138726  e90a4135-7b3d-de39-3aa8-c2b9be44805b  nomad-client2  91244_mod_group-2  0        run      failed   2023-03-09T16:38:02+01:00  2023-03-09T16:38:35+01:00
8ab14f74-84e8-e6b8-ff12-4364fd422530  8717bcb2-8740-875a-22fd-3f3d3fbe8349  e90a4135-7b3d-de39-3aa8-c2b9be44805b  nomad-client2  91244_mod_group-2  0        stop     failed   2023-03-09T16:35:25+01:00  2023-03-09T16:38:02+01:00
0b45abfb-9031-bc4c-57df-172edbe8f5ac  ff95bfd5-1038-420c-b558-d2d6146c695e  e90a4135-7b3d-de39-3aa8-c2b9be44805b  nomad-client2  91244_mod_group-2  0        stop     failed   2023-03-09T16:33:54+01:00  2023-03-09T16:35:25+01:00
14afd7f1-6e43-48a2-0f23-115945cb13d3  b48eff2a-3902-3fa2-510e-61f393e23b14  221c6f40-143a-bb76-0916-864a63a602b0  nomad-client1  91244_mod_group-2  0        stop     running  2023-03-09T16:32:53+01:00  2023-03-09T16:39:00+01:00

Alloc Status (nomad client is ready for 15 minutes)

Allocations
ID                                    Eval ID                               Node ID                               Node Name      Task Group         Version  Desired  Status    Created                    Modified
6e54d9ad-0e8c-22c4-fa0d-206eebe738bf  a74d7806-fbbf-648a-e7aa-7aa24e3687c3  e90a4135-7b3d-de39-3aa8-c2b9be44805b  nomad-client2  91244_mod_group-2  0        run      failed    2023-03-09T16:51:10+01:00  2023-03-09T16:51:44+01:00
c9677524-56f4-7218-1e1d-2c500278eada  c4a6b7c0-a5e4-1042-ae60-5a8e2aba915f  e90a4135-7b3d-de39-3aa8-c2b9be44805b  nomad-client2  91244_mod_group-2  0        stop     failed    2023-03-09T16:42:35+01:00  2023-03-09T16:51:10+01:00
e35fba74-6dd5-23d2-8dbc-011d9ffb2847  054b9d92-6cb6-3e30-26d8-ea89a1138726  e90a4135-7b3d-de39-3aa8-c2b9be44805b  nomad-client2  91244_mod_group-2  0        stop     failed    2023-03-09T16:38:02+01:00  2023-03-09T16:42:35+01:00
8ab14f74-84e8-e6b8-ff12-4364fd422530  8717bcb2-8740-875a-22fd-3f3d3fbe8349  e90a4135-7b3d-de39-3aa8-c2b9be44805b  nomad-client2  91244_mod_group-2  0        stop     failed    2023-03-09T16:35:25+01:00  2023-03-09T16:38:02+01:00
0b45abfb-9031-bc4c-57df-172edbe8f5ac  ff95bfd5-1038-420c-b558-d2d6146c695e  e90a4135-7b3d-de39-3aa8-c2b9be44805b  nomad-client2  91244_mod_group-2  0        stop     failed    2023-03-09T16:33:54+01:00  2023-03-09T16:35:25+01:00
14afd7f1-6e43-48a2-0f23-115945cb13d3  b48eff2a-3902-3fa2-510e-61f393e23b14  221c6f40-143a-bb76-0916-864a63a602b0  nomad-client1  91244_mod_group-2  0        stop     complete  2023-03-09T16:32:53+01:00  2023-03-09T16:39:01+01:00

Debug bundles taken while testing (1.3.10 & 1.5.0) have been uploaded to:
https://drive.google.com/drive/folders/1lXpT6FYZraqD-Xm5ljIyzh_tX68o7dAj?usp=share_link

@tgross tgross reopened this Mar 10, 2023
@lgfa29
Copy link
Contributor

lgfa29 commented Mar 21, 2023

I was finally able to reproduce the error, some key things that are needed:

  • 2+ clients
  • The job must fail in all clients except the one that disconnects

The root cause seems to be a failed logic in the scheduler reconciler that doesn't stop failed allocations when an allocation reconnects, leaving the cluster in a state where 2 allocs have DesiredStatus = run after the client reconnects.

The reconciler then runs again and notices that it needs to pick on allocation to stop. Since the allocation is already reconnected it doesn't have any preference between which on to keep and it will stop either of them.

One key problem is they the job file is written. It has an implicit constraint that makes it so it can only run in a specific client. This is not a good practice as it prevents Nomad from being able to perform proper scheduling. Situations like this should be handled with affinity and constraint rather than a prestart task.

I have a custom build just to validate this assumption. @ron-savoia or @margiran would either of you be able to validate if this custom build fixes the problem?

@margiran
Copy link

Thank you @lgfa29,
I followed the same reproducing steps with the custom build, and it fixed the problem.

Allocations
ID                                    Eval ID                               Node ID                               Node Name      Task Group         Version  Desired  Status   Created                    Modified
0dba2e92-22ad-90df-2d09-28b1538f1eb6  c541a8d6-7466-cfb3-3e09-335ccd25a0df  8da31bc9-e7db-5e17-396d-e061d085c346  nomad-client1  91244_mod_group-2  0        run      running  2023-03-21T13:00:13+01:00  2023-03-21T13:00:34+01:00
70b40c99-5814-d30d-36c0-d7cb4e8dde90  1116f96a-999c-6c14-a111-10f3c59666c6  39e48f6c-afcc-40f0-802d-1ac17827e2af  nomad-client2  91244_mod_group-2  0        stop     failed   2023-03-21T12:59:09+01:00  2023-03-21T13:00:13+01:00
***************************************************************************************************
********* stop nomad service on the worker node...
********* waiting 5 min...
Tue Mar 21 13:00:36 CET 2023
***************************************************************************************************
********* 5 Minute (after nomad service Stoped on the worker node)

Allocations
ID                                    Eval ID                               Node ID                               Node Name      Task Group         Version  Desired  Status   Created                    Modified
09624031-ea7b-a55d-b30e-e8bc13f94deb  b288bf2b-1b72-9d76-009f-d29350893307  39e48f6c-afcc-40f0-802d-1ac17827e2af  nomad-client2  91244_mod_group-2  0        run      failed   2023-03-21T13:04:04+01:00  2023-03-21T13:04:40+01:00
1ecbf0b6-1a2d-22d1-13e3-c99a199063a8  f3df0b7d-503f-6836-b3b0-7a1857ea5d7f  39e48f6c-afcc-40f0-802d-1ac17827e2af  nomad-client2  91244_mod_group-2  0        stop     failed   2023-03-21T13:01:30+01:00  2023-03-21T13:04:04+01:00
0dba2e92-22ad-90df-2d09-28b1538f1eb6  c541a8d6-7466-cfb3-3e09-335ccd25a0df  8da31bc9-e7db-5e17-396d-e061d085c346  nomad-client1  91244_mod_group-2  0        run      unknown  2023-03-21T13:00:13+01:00  2023-03-21T13:01:30+01:00
70b40c99-5814-d30d-36c0-d7cb4e8dde90  1116f96a-999c-6c14-a111-10f3c59666c6  39e48f6c-afcc-40f0-802d-1ac17827e2af  nomad-client2  91244_mod_group-2  0        stop     failed   2023-03-21T12:59:09+01:00  2023-03-21T13:00:13+01:00
***************************************************************************************************
********* start nomad service ...
********* waiting 5 min...
Tue Mar 21 13:06:13 CET 2023
***************************************************************************************************
********* 5 Minutes (after nomad service started on the worker node)

Allocations
ID                                    Eval ID                               Node ID                               Node Name      Task Group         Version  Desired  Status   Created                    Modified
09624031-ea7b-a55d-b30e-e8bc13f94deb  b288bf2b-1b72-9d76-009f-d29350893307  39e48f6c-afcc-40f0-802d-1ac17827e2af  nomad-client2  91244_mod_group-2  0        stop     failed   2023-03-21T13:04:04+01:00  2023-03-21T13:06:25+01:00
1ecbf0b6-1a2d-22d1-13e3-c99a199063a8  f3df0b7d-503f-6836-b3b0-7a1857ea5d7f  39e48f6c-afcc-40f0-802d-1ac17827e2af  nomad-client2  91244_mod_group-2  0        stop     failed   2023-03-21T13:01:30+01:00  2023-03-21T13:04:04+01:00
0dba2e92-22ad-90df-2d09-28b1538f1eb6  c541a8d6-7466-cfb3-3e09-335ccd25a0df  8da31bc9-e7db-5e17-396d-e061d085c346  nomad-client1  91244_mod_group-2  0        run      running  2023-03-21T13:00:13+01:00  2023-03-21T13:06:25+01:00
70b40c99-5814-d30d-36c0-d7cb4e8dde90  1116f96a-999c-6c14-a111-10f3c59666c6  39e48f6c-afcc-40f0-802d-1ac17827e2af  nomad-client2  91244_mod_group-2  0        stop     failed   2023-03-21T12:59:09+01:00  2023-03-21T13:00:13+01:00
***************************************************************************************************
********* waiting 10 min...
***************************************************************************************************
********* 15 Minutes (after nomad service started on the worker node)

Allocations
ID                                    Eval ID                               Node ID                               Node Name      Task Group         Version  Desired  Status   Created                    Modified
09624031-ea7b-a55d-b30e-e8bc13f94deb  b288bf2b-1b72-9d76-009f-d29350893307  39e48f6c-afcc-40f0-802d-1ac17827e2af  nomad-client2  91244_mod_group-2  0        stop     failed   2023-03-21T13:04:04+01:00  2023-03-21T13:06:25+01:00
1ecbf0b6-1a2d-22d1-13e3-c99a199063a8  f3df0b7d-503f-6836-b3b0-7a1857ea5d7f  39e48f6c-afcc-40f0-802d-1ac17827e2af  nomad-client2  91244_mod_group-2  0        stop     failed   2023-03-21T13:01:30+01:00  2023-03-21T13:04:04+01:00
0dba2e92-22ad-90df-2d09-28b1538f1eb6  c541a8d6-7466-cfb3-3e09-335ccd25a0df  8da31bc9-e7db-5e17-396d-e061d085c346  nomad-client1  91244_mod_group-2  0        run      running  2023-03-21T13:00:13+01:00  2023-03-21T13:06:25+01:00
70b40c99-5814-d30d-36c0-d7cb4e8dde90  1116f96a-999c-6c14-a111-10f3c59666c6  39e48f6c-afcc-40f0-802d-1ac17827e2af  nomad-client2  91244_mod_group-2  0        stop     failed   2023-03-21T12:59:09+01:00  2023-03-21T13:00:13+01:00

@lgfa29
Copy link
Contributor

lgfa29 commented Mar 22, 2023

Thank you very much for the confirmation @margiran! I working on a proper fix and will open a PR as soon as it's ready.

Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 12, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hcc/cst Admin - internal stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/edge type/bug
Projects
6 participants