Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use compute_ctl to manage Postgres in tests. #3886

Merged
merged 12 commits into from
Jun 6, 2023
Merged

Conversation

hlinnaka
Copy link
Contributor

@hlinnaka hlinnaka commented Mar 27, 2023

This adds test coverage for 'compute_ctl', as it is now used by all
the python tests.

There are a few differences in how 'compute_ctl' is called in the
tests, compared to the real web console:

  • In the tests, the postgresql.conf file is included as one large
    string in the spec file, and it is written out as it is to the data
    directory. I added a new field for that to the spec file. The real
    web console, however, sets all the necessary settings in the
    'settings' field, and 'compute_ctl' creates the postgresql.conf from
    those settings.

  • In the tests, the information needed to connect to the storage, i.e.
    tenant_id, timeline_id, connection strings to pageserver and
    safekeepers, are now passed as new fields in the spec file. The real
    web console includes them as the GUCs in the 'settings' field. (Both
    of these are different from what the test control plane used to do:
    It used to write the GUCs directly in the postgresql.conf file). The
    plan is to change the control plane to use the new method, and
    remove the old method, but for now, support both.

Some tests that were sensitive to the amount of WAL generated needed
small changes, to accommodate that compute_ctl runs the background
health monitor which makes a few small updates. Also some tests shut
down the pageserver, and now that the background health check can run
some queries while the pageserver is down, that can produce a few
extra errors in the logs, which needed to be allowlisted.

Other changes:

  • remove obsolete comments about PostgresNode;
  • Create standby.signal file for Static compute node;
  • log output of compute_ctl and postgres is merged into endpoints/compute.log.

@hlinnaka hlinnaka requested review from MMeent, ololobus and kelvich March 27, 2023 15:48
Copy link
Member

@ololobus ololobus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't read everything thoroughly, but here are some comments. In overall, looks nice!

libs/compute_api/src/spec.rs Outdated Show resolved Hide resolved
libs/compute_api/src/spec.rs Show resolved Hide resolved
compute_tools/src/config.rs Show resolved Hide resolved
libs/compute_api/src/spec.rs Outdated Show resolved Hide resolved
libs/compute_api/src/spec.rs Outdated Show resolved Hide resolved
compute_tools/src/bin/compute_ctl.rs Outdated Show resolved Hide resolved
compute_tools/src/pg_helpers.rs Outdated Show resolved Hide resolved
@hlinnaka hlinnaka marked this pull request as ready for review March 30, 2023 09:35
@hlinnaka hlinnaka requested review from a team as code owners March 30, 2023 09:35
@hlinnaka hlinnaka requested review from shanyp and removed request for a team March 30, 2023 09:35
@hlinnaka hlinnaka force-pushed the test-compute_ctl branch 2 times, most recently from 11946ab to 58d39fa Compare April 3, 2023 07:17
@hlinnaka hlinnaka changed the base branch from main to neon_local-endpoints April 3, 2023 14:31
libs/compute_api/src/spec.rs Outdated Show resolved Hide resolved
libs/compute_api/src/spec.rs Outdated Show resolved Hide resolved
libs/compute_api/src/spec.rs Outdated Show resolved Hide resolved
libs/compute_api/src/spec.rs Outdated Show resolved Hide resolved
@hlinnaka hlinnaka force-pushed the neon_local-endpoints branch 3 times, most recently from 9083d1e to 56bf5b4 Compare April 3, 2023 17:32
@hlinnaka hlinnaka force-pushed the neon_local-endpoints branch 4 times, most recently from 1326ccc to d823c15 Compare April 9, 2023 19:01
@hlinnaka hlinnaka force-pushed the neon_local-endpoints branch from d823c15 to 50e38f5 Compare April 12, 2023 07:01
@hlinnaka hlinnaka force-pushed the test-compute_ctl branch 3 times, most recently from aed055e to 13d2254 Compare May 5, 2023 10:50
@hlinnaka hlinnaka requested a review from MMeent May 5, 2023 10:53
compute_tools/src/bin/compute_ctl.rs Show resolved Hide resolved
control_plane/src/endpoint.rs Show resolved Hide resolved
control_plane/src/endpoint.rs Outdated Show resolved Hide resolved
@hlinnaka hlinnaka force-pushed the test-compute_ctl branch 2 times, most recently from 4b7f5b5 to 75227e8 Compare May 9, 2023 17:25
hlinnaka and others added 10 commits June 5, 2023 18:46
If the port is hard-coded, you cannot have more than one compute_ctl
running at a time. (Or not at all, if you're unlucky and the port is
in use)
'compute_ctl' merely prints them to the log.

The next commit will make the local test control plane to also use
'compute_ctl', and it doesn't have the concept of a project, so it
cannot fill them with any sensible values.
This adds test coverage for 'compute_ctl', as it is now used by all
the python tests.

There are a few differences in how 'compute_ctl' is called in the
tests, compared to the real web console:

- In the tests, the postgresql.conf file is included as one large
  string in the spec file, and it is written out as it is to the data
  directory.  I added a new field for that to the spec file. The real
  web console, however, sets all the necessary settings in the
  'settings' field, and 'compute_ctl' creates the postgresql.conf from
  those settings.

- In the tests, the information needed to connect to the storage, i.e.
  tenant_id, timeline_id, connection strings to pageserver and
  safekeepers, are now passed as new fields in the spec file. The real
  web console includes them as the GUCs in the 'settings' field. (Both
  of these are different from what the test control plane used to do:
  It used to write the GUCs directly in the postgresql.conf file). The
  plan is to change the control plane to use the new method, and
  remove the old method, but for now, support both.

Some tests that were sensitive to the amount of WAL generated needed
small changes, to accommodate that compute_ctl runs the background
health monitor which makes a few small updates. Also some tests shut
down the pageserver, and now that the background health check can run
some queries while the pageserver is down, that can produce a few
extra errors in the logs, which needed to be allowlisted.
It was renamed to Endpoint and refactored to reduce code duplication.
@lubennikovaav lubennikovaav merged commit df3bae2 into main Jun 6, 2023
@lubennikovaav lubennikovaav deleted the test-compute_ctl branch June 6, 2023 13:59
@hlinnaka
Copy link
Contributor Author

hlinnaka commented Jun 6, 2023

LGTM, thanks!

awestover pushed a commit that referenced this pull request Jun 14, 2023
This adds test coverage for 'compute_ctl', as it is now used by all
the python tests.
    
There are a few differences in how 'compute_ctl' is called in the
tests, compared to the real web console:
    
- In the tests, the postgresql.conf file is included as one large
  string in the spec file, and it is written out as it is to the data
  directory.  I added a new field for that to the spec file. The real
  web console, however, sets all the necessary settings in the
  'settings' field, and 'compute_ctl' creates the postgresql.conf from
  those settings.

- In the tests, the information needed to connect to the storage, i.e.
  tenant_id, timeline_id, connection strings to pageserver and
  safekeepers, are now passed as new fields in the spec file. The real
  web console includes them as the GUCs in the 'settings' field. (Both
  of these are different from what the test control plane used to do:
  It used to write the GUCs directly in the postgresql.conf file). The
  plan is to change the control plane to use the new method, and
  remove the old method, but for now, support both.

Some tests that were sensitive to the amount of WAL generated needed
small changes, to accommodate that compute_ctl runs the background
health monitor which makes a few small updates. Also some tests shut
down the pageserver, and now that the background health check can run
some queries while the pageserver is down, that can produce a few
extra errors in the logs, which needed to be allowlisted.

Other changes:
- remove obsolete comments about PostgresNode;
- create standby.signal file for Static compute node;
- log output of `compute_ctl` and `postgres` is merged into
`endpoints/compute.log`.

---------

Co-authored-by: Anastasia Lubennikova <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants