Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert Timeline::layers to async RwLock atop #4333 #4361

Conversation

problame
Copy link
Contributor

@problame problame commented May 26, 2023

convert Timeline::layers to async RwLock atop #4333

skyzh and others added 21 commits May 24, 2023 19:36
This PR refactors the original page_binutils with a single tool pagectl,
use clap derive for better command line parsing, and adds the dump kv
tool to extract information from delta file. This helps me better
understand what's inside the page server. We can add support for other
types of file and more functionalities in the future.

---------

Signed-off-by: Alex Chi <[email protected]>
Probably increase CI success rate.

---------

Signed-off-by: Alex Chi <[email protected]>
(Instead of going through mgr every iteration.)

The `wait_for_active_tenant` function's `wait` argument could be removed
because it was only used for the loop that waits for the tenant to show
up in the tenants map. Since we're passing the tenant in, we now longer
need to get it from the tenants map.

NB that there's no guarantee that the tenant object is in the tenants
map at the time the background loop function starts running. But the
tenant mgr guarantees that it will be quite soon. See
`tenant_map_insert` way upwards in the call hierarchy for details.

This is prep work to eliminate `subscribe_for_state_updates` (PR #4299 )

Fixes: #3501
## Describe your changes

## Issue ticket number and link

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [x] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
Before this patch, it would use error type `TenantStateError` which has
many more error variants than can actually happen with
`mgr::get_tenant`.

Along the way, I also introduced `SetNewTenantConfigError` because it
uses `mgr::get_tenant` and also can only fail in much fewer ways than
`TenantStateError` suggests.

The new `page_service.rs`'s `GetActiveTimelineError` and
`GetActiveTenantError` types were necessary to avoid an `Other` variant
on the `GetTenantError`.

This patch is a by-product of reading code that subscribes to
`Tenant::state` changes.
Can't really connect it to any given project.
Note this also changes the status code to the (correct) 404. Not sure if
that's relevant to Console.

Context:
https://neondb.slack.com/archives/C04PSBP2SAF/p1684746238831449?thread_ts=1684742106.169859&cid=C04PSBP2SAF

Atop #4300 because it cleans up the mgr::get_tenant() error type and I want eyes on that PR.
(This is prep work to make `Timeline::activate` infallible.)

This patch removes the global storage_broker client instance from the
pageserver codebase.

Instead, pageserver startup instantiates it and passes it down to the
`Timeline::activate` function, which in turn passes it to the
WalReceiver, which is the entity that actually uses it.

Patch series:

- #4316
- #4317
- #4318
- #4319
proceeding #4237, this PR bumps
AWS dependencies along with all other dependencies to the latest
compatible semver.

Signed-off-by: Alex Chi <[email protected]>
Flakyness most likely introduced in #4170, detected in
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4232/4980691289/index.html#suites/542b1248464b42cc5a4560f408115965/18e623585e47af33.

Opted to allow it globally because it can happen in other tests as well,
basically whenever compaction is enabled and we stop pageserver
gracefully.
Routine `vm-builder` version bump, from autoscaling repo release. You
can find the release notes here:
https://github.com/neondatabase/autoscaling/releases/tag/v0.8.0
The changes are from v0.7.2 — most of them were already included in
v0.7.3-alpha3.

Of particular note: This (finally) fixes the cgroup issues, so we should
now be able to scale up when we're about to run out of memory.

**NB:** This has the effect of limit the DB's memory usage in a way it
wasn't limited before. We may run into issues because of that. There is
currently no way to disable that behavior, other than switching the
endpoint back to the k8s-pod provisioner.
Timeline::activate() was only fallible because `launch_wal_receiver`
was.

`launch_wal_receiver` was fallible only because of some preliminary
checks in `WalReceiver::start`.

Turns out these checks can be shifted to the type system by delaying
creatinon of the `WalReceiver` struct to the point where we activate the
timeline.

The changes in this PR were enabled by my previous refactoring that
funneled the broker_client from pageserver startup to the activate()
call sites.

Patch series:

- #4316
- #4317
- #4318
- #4319
## Problem

Test `test_metric_collection` become flaky:

```
AssertionError: assert not ['2023-05-25T14:03:41.644042Z ERROR metrics_collection: failed to send metrics: reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("localhost")), port: Some(18022), path: "/billing/api/v1/usage_events", query: None, fragment: None }, source: hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 99, kind: AddrNotAvailable, message: "Cannot assign requested address" })) }',
                            ...]
```
I suspect it is caused by having 2 places when we define
`httpserver_listen_address` fixture (which is internally used by
`pytest-httpserver` plugin)

## Summary of changes
- Remove the definition of `httpserver_listen_address` from
`test_runner/regress/test_ddl_forwarding.py` and keep one in
`test_runner/fixtures/neon_fixtures.py`
- Also remote unused `httpserver_listen_address` parameter from
`test_proxy_metric_collection`
We currently have a semaphore based rate limiter which we hope will keep
us under S3 limits. However, the semaphore does not consider time, so
I've been hesitant to raise the concurrency limit of 100.

See #3698.

The PR Introduces a leaky-bucket based rate limiter instead of the
`tokio::sync::Semaphore` which will allow us to raise the limit later
on. The configuration changes are not contained here.
We used to generate the ID, if the caller didn't specify it. That's bad
practice, however, because network is never fully reliable, so it's
possible we create a new tenant but the caller doesn't know about it,
and because it doesn't know the tenant ID, it has no way of retrying or
checking if it succeeded. To discourage that, make it mandatory. The web
control plane has not relied on the auto-generation for a long time.
## Problem

GitHub Autocomment script posts a comment only for PRs. It's harder
to debug failed tests on main or release branches.

## Summary of changes

- Change the GitHub Autocomment script to be able to post a comment to
either a PR or a commit of a branch
This reverts commit 024109f for it
failing to be speed up anything, but run into more errors.

See: #3698.
…ct_level0_phase1

Without this, the seocnd read().unwrap() becomes an await point,
which makes the future not-Send, but, we require it to be Send
because it runs inside task_mgr::spawn, which requires the Fut's to be Send
The checkpointer loop isn't running anyway, so, there's no risk of
blocking it through the pre-lock.

(cherry picked from commit 1b26633)
…utex internally

fails with

cs@devvm:[~/src/neon]: cargo check -p pageserver  --features testing
    Checking pageserver v0.1.0 (/home/cs/src/neon/pageserver)
error: future cannot be sent between threads safely
   --> pageserver/src/tenant/timeline/walreceiver/connection_manager.rs:426:33
    |
426 |         let connection_handle = TaskHandle::spawn(move |events_sender, cancellation| {
    |                                 ^^^^^^^^^^^^^^^^^ future created by async block is not `Send`
    |
    = help: within `Instrumented<[async block@pageserver/src/tenant/timeline/walreceiver/connection_manager.rs:427:13: 439:14]>`, the trait `std::marker::Send` is not implemented for `std::sync::RwLockReadGuard<'_, LayerMap<dyn PersistentLayer>>`
note: future is not `Send` as this value is used across an await
   --> pageserver/src/tenant/timeline.rs:872:46
    |
850 |         let layers = self.layers.read().unwrap();
    |             ------ has type `std::sync::RwLockReadGuard<'_, LayerMap<dyn PersistentLayer>>` which is not `Send`
...
872 |                 self.freeze_inmem_layer(true).await;
    |                                              ^^^^^^ await occurs here, with `layers` maybe used later
...
881 |     }
    |     - `layers` is later dropped here
note: required by a bound in `TaskHandle::<E>::spawn`
   --> pageserver/src/tenant/timeline/walreceiver.rs:196:52
    |
192 |     fn spawn<Fut>(
    |        ----- required by a bound in this
...
196 |         Fut: Future<Output = anyhow::Result<()>> + Send,
    |                                                    ^^^^ required by this bound in `TaskHandle::<E>::spawn`

error: could not compile `pageserver` due to previous error
@problame problame changed the base branch from main to problame/async-timeline-get/tenant-timelines-tokio-sync-mutex May 26, 2023 18:21
skyzh and others added 8 commits May 26, 2023 17:15
Compaction is usually a compute-heavy process and might affect other
futures running on the thread of the compaction. Therefore, we add
`block_in_place` as a temporary solution to avoid blocking other futures
on the same thread as compaction in the runtime. As we are migrating
towards a fully-async-style pageserver, we can revert this change when
everything is async and when we move compaction to a separate runtime.

---------

Signed-off-by: Alex Chi <[email protected]>
Require the error type to be ApiError. It implicitly required that
anyway, because the function used error::handler, which downcasted the
error to an ApiError. If the error was in fact anything else than
ApiError, it would just panic. Better to check it at compilation time.

Also make the last-resort error handler more forgiving, so that it
returns an 500 Internal Server error response, instead of panicking,
if a request handler returns some other error than an ApiError.
Previously, you used it like this:

    |r| RequestSpan(my_handler).handle(r)

But I don't see the point of the RequestSpan struct. It's just a
wrapper around the handler function. With this commit, the call
becomes:

    |r| request_span(r, my_handler)

Which seems a little simpler.

At first I thought that the RequestSpan struct would allow "chaining"
other kinds of decorators like RequestSpan, so that you could do
something like this:

    |r| CheckPermissions(RequestSpan(my_handler)).handle(r)

But it doesn't work like that. If each of those structs wrap a handler
*function*, it would actually look like this:

    |r| CheckPermissions(|r| RequestSpan(my_handler).handle(r))).handle(r)

This commit doesn't make that kind of chaining any easier, but seems a
little more straightforward anyway.
If the timeline is already being deleted, return an error. We used to
notice the duplicate request and error out in
persist_index_part_with_deleted_flag(), but it's better to detect it
earlier. Add an explicit lock for the deletion.

Note: This doesn't do anything about the async cancellation problem
(github issue #3478): if the original HTTP request dropped, because the
client disconnected, the timeline deletion stops half-way through the
operation. That needs to be fixed, too, but that's a separate story.

(This is a simpler replacement for PR #4194. I'm also working on the
cancellation shielding, see PR #4314.)
#4155 inadvertently switched to a version of the VM builder that leaves
the file cache integration disabled by default. This re-enables the
vm-informant's file cache integration.

(as a refresher: The vm-informant is the autoscaling component that sits
inside the VM and manages postgres / compute_ctl)

See also: neondatabase/autoscaling#265
This is preliminary work for/from #4220 (async
`Layer::get_value_reconstruct_data`).
The motivation is to avoid locking `Tenant::timelines` in places that
can't be `async`, because in #4333 we want to convert Tenant::timelines
from `std::sync::Mutex` to `tokio::sync::Mutex`.

But, the changes here are useful in general because they clean up &
document tenant state transitions.
That also paves the way for #4350, which is an alternative to #4333 that
refactors the pageserver code so that we can keep the
`Tenant::timelines` mutex sync.

This patch consists of the following core insights and changes:

* spawn_load and spawn_attach own the tenant state until they're done
* once load()/attach() calls are done ...
* if they failed, transition them to Broken directly (we know that
there's no background activity because we didn't call activate yet)
* if they succeed, call activate. We can make it infallible. How? Later.
* set_broken() and set_stopping() are changed to wait for spawn_load() /
spawn_attach() to finish.
* This sounds scary because it might hinder detach or shutdown, but
actually, concurrent attach+detach, or attach+shutdown, or
load+shutdown, or attach+shutdown were just racy before this PR.
     So, with this change, they're not anymore.
In the future, we can add a `CancellationToken` stored in Tenant to
cancel `load` and `attach` faster, i.e., make `spawn_load` /
`spawn_attach` transition them to Broken state sooner.

See the doc comments on TenantState for the state transitions that are
now possible.
It might seem scary, but actually, this patch reduces the possible state
transitions.

We introduce a new state `TenantState::Activating` to avoid grabbing the
`Tenant::timelines` lock inside the `send_modify` closure.
These were the humble beginnings of this PR (see Motivation section),
and I think it's still the right thing to have this `Activating` state,
even if we decide against async `Tenant::timelines` mutex. The reason is
that `send_modify` locks internally, and by moving locking of
Tenant::timelines out of the closure, the internal locking of
`send_modify` becomes a leaf of the lock graph, and so, we eliminate
deadlock risk.

Co-authored-by: Joonas Koivunen <[email protected]>
Startup can take a long time. We suspect it's the initial logical size
calculations. Long term solution is to not block the tokio executors but
do most of I/O in spawn_blocking.

See: #4025, #4183

Short-term solution to above:

- Delay global background tasks until initial tenant loads complete
- Just limit how many init logical size calculations can we have at the
same time to `cores / 2`

This PR is for trying in staging.
added in #4366. revert for testing without it; it may have unintenteded
side-effects, and it's very difficult to understand the results from the
10k load testing environments. earlier results:
#4366 (comment)
bojanserafimov and others added 27 commits May 31, 2023 12:23
…d… (#4353)

This parameter can be use to restrict number of image layers generated
because of GC request (wanted image layers).
Been set to zero it completely eliminates creation of such image layers.
So it allows to avoid extra storage consumption after merging #3673

## Problem
PR #3673 forces generation of missed image layers. So i short term is
cause cause increase (in worst case up to two times) size of storage.
It was intended (by me) that GC period is comparable with PiTR interval.
But looks like it is not the case now - GC is performed much more
frequently. It may cause the problem with space exhaustion: GC forces
new image creation while large PiTR still prevent GC from collecting old
layers.

## Summary of changes

Add new pageserver parameter` forced_image_creation_limit` which
restrict number of created image layers which are requested by GC.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
compute_ctl can panic, but `tracing` is used for logging. panic stderr
output can interleave with messages from normal logging. The fix is to
use the established way (pageserver, safekeeper, storage_broker) of using
`tracing` to report panics.
## Problem

This PR includes doc changes to the current metrics as well as adding
new metrics. With the new set of metrics, we can quantitatively analyze
the read amp., write amp. and space amp. in the system, when used
together with https://github.com/neondatabase/neonbench

close #4312
ref #4347

compaction metrics TBD, a novel idea is to print L0 file number and
number of layers in the system, and we can do this in the future when we
start working on compaction.

## Summary of changes

* Add `READ_NUM_FS_LAYERS` for computing read amp.
* Add `MATERIALIZED_PAGE_CACHE_HIT_UPON_REQUEST`.
* Add `GET_RECONSTRUCT_DATA_TIME`. GET_RECONSTRUCT_DATA_TIME +
RECONSTRUCT_TIME + WAIT_LSN_TIME should be approximately total time of
reads.
* Add `5.0` and `10.0` to `STORAGE_IO_TIME_BUCKETS` given some fsync
runs slow (i.e., > 1s) in some cases.
* Some `WAL_REDO` metrics are only used when Postgres is involved in the
redo process.

---------

Signed-off-by: Alex Chi <[email protected]>
## Problem

Part of #4373

## Summary of changes

This PR adds `PersistentLayerDesc`, which will be used in LayerMap
mapping and probably layer cache. After this PR and after we change
LayerMap to map to layer desc, we can safely drop RemoteLayerDesc.

---------

Signed-off-by: Alex Chi <[email protected]>
Co-authored-by: bojanserafimov <[email protected]>
See superceded #4390.

- capture log in test
- expand the span to cover init and error reporting
- remove obvious logging by logging only unexpected
We now spawn a new task for every HTTP request, and wait on the
JoinHandle. If Hyper drops the Future, the spawned task will keep
running. This protects the rest of the pageserver code from unexpected
async cancellations.

This creates a CancellationToken for each request and passes it to the
handler function. If the HTTP request is dropped by the client, the
CancellationToken is signaled. None of the handler functions make use
for the CancellationToken currently, but they now they could.

The CancellationToken arguments also work like documentation. When
you're looking at a function signature and you see that it takes a
CancellationToken as argument, it's a nice hint that the function might
run for a long time, and won't be async cancelled. The default
assumption in the pageserver is now that async functions are not
cancellation-safe anyway, unless explictly marked as such, but this is a
nice extra reminder.

Spawning a task for each request is OK from a performance point of view
because spawning is very cheap in Tokio, and none of our HTTP requests
are very performance critical anyway.

Fixes issue #3478
Upgrading to rust 1.70 will require these.
## Describe your changes

Port HNSW implementation for ANN search top Postgres 

## Issue ticket number and link

https://www.pinecone.io/learn/hnsw

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
As seen on deployment of 2023-06-01 release, times were improving but
there were some outliers caused by:
- timelines `eviction_task` starting while activating and running
imitation
- timelines `initial logical size` calculation

This PR fixes it so that `eviction_task` is delayed like other
background tasks fixing an oversight from earlier #4372.

After this PR activation will be two phases:
1. load and activate tenants AND calculate some initial logical sizes
2. rest of initial logical sizes AND background tasks
- compaction, gc, disk usage based eviction, timelines `eviction_task`,
consumption metrics
Create pgx_ulid extension

```
postgres=# create extension ulid;
CREATE EXTENSION
postgres=# CREATE TABLE users (
  id ulid NOT NULL DEFAULT gen_ulid() PRIMARY KEY,
  name text NOT NULL
);
CREATE TABLE
postgres=# insert into users (name) values ('vadim');
INSERT 0 1
postgres=# select * from users;
             id             | name
----------------------------+-------
 01H25DDG3KYMYZTNR41X38E256 | vadim
```
Solves a flaky test error in the wild[^1] by:

- Make the gc shutdown signal reading an `allowed_error`
- Note the gc shutdown signal readings as being in `allowed_error`s
- Allow passing tenant conf to init_start to avoid unncessary tenants

[^1]:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4399/5176432780/index.html#suites/b97efae3a617afb71cb8142f5afa5224/2cd76021ea011f93
walreceiver logs are a bit hard to understand because of partial span
usage, extra messages, ignored errors popping up as huge stacktraces.

Fixes #3330 (by spans, also demote info -> debug).

- arrange walreceivers spans into a hiearchy: 
    - `wal_connection_manager{tenant_id, timeline_id}` ->
      `connection{node_id}` -> `poller`
- unifies the error reporting inside `wal_receiver`:
- All ok errors are now `walreceiver connection handling ended: {e:#}`
- All unknown errors are still stacktraceful task_mgr reported errors
  with context `walreceiver connection handling failure`
- Remove `connect` special casing, was: `DB connection stream finished`
  for ok errors
- Remove `done replicating` special casing, was `Replication stream
  finished` for ok errors
- lowered log levels for (non-exhaustive list):
    - `WAL receiver manager started, connecting to broker` (at startup)
    - `WAL receiver shutdown requested, shutting down` (at shutdown)
    - `Connection manager loop ended, shutting down` (at shutdown)
    - `sender is dropped while join handle is still alive` (at lucky
      shutdown, see #2885)
    - `timeline entered terminal state {:?}, stopping wal connection manager
      loop` (at shutdown)
    - `connected!` (at startup)
- `Walreceiver db connection closed` (at disconnects?, was without span)
    - `Connection cancelled` (at shutdown, was without span)
- `observed timeline state change, new state is {new_state:?}` (never
  after Timeline::activate was made infallible)
- changed:
    - `Timeline dropped state updates sender, stopping wal connection
      manager loop`
    - was out of date; sender is not dropped but `Broken | Stopping` state
      transition
        - also made `debug!`
    - `Timeline dropped state updates sender before becoming active,
      stopping wal connection manager loop`
    - was out of date: sender is again not dropped but `Broken | Stopping`
      state transition
        - also made `debug!`
- log fixes:
    - stop double reporting panics via JoinError
added the `allowed_error` to the `positive_env` so any tests completing
the attach are allowed have this print out. they are allowed to do so,
because the `random_init_delay` can produce close to zero and thus the
first run will be near attach. Though... Unsure if we ever really need
the eviction task to run **before** it can evict something, as in after
20min or 24h.

in the failed test case however period is 20s so interesting that we
didn't run into this sooner.

evidence of flaky:
https://github.com/neondatabase/neon/actions/runs/5175677035/jobs/9323705929?pr=4399#step:4:38536
…#4412)

…ompiler

## Problem

## Summary of changes

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
Flakyness introduced by #4402 evidence [^1].

I had assumed the NotConnected would had been an expected io error, but
it's not. Restore the global `allowed_error`.

[^1]:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4407/5185897757/index.html#suites/82004ab4e3720b47bf78f312dabe7c55/14f636d0ecd3939d/
We have 2 ways of tenant shutdown, we should have just one.

Changes are mostly mechanical simple refactorings.

Added `warn!` on the "shutdown all remaining tasks" should trigger test
failures in the between time of not having solved the "tenant/timeline
owns all spawned tasks" issue.

Cc: #4327.
This adds test coverage for 'compute_ctl', as it is now used by all
the python tests.
    
There are a few differences in how 'compute_ctl' is called in the
tests, compared to the real web console:
    
- In the tests, the postgresql.conf file is included as one large
  string in the spec file, and it is written out as it is to the data
  directory.  I added a new field for that to the spec file. The real
  web console, however, sets all the necessary settings in the
  'settings' field, and 'compute_ctl' creates the postgresql.conf from
  those settings.

- In the tests, the information needed to connect to the storage, i.e.
  tenant_id, timeline_id, connection strings to pageserver and
  safekeepers, are now passed as new fields in the spec file. The real
  web console includes them as the GUCs in the 'settings' field. (Both
  of these are different from what the test control plane used to do:
  It used to write the GUCs directly in the postgresql.conf file). The
  plan is to change the control plane to use the new method, and
  remove the old method, but for now, support both.

Some tests that were sensitive to the amount of WAL generated needed
small changes, to accommodate that compute_ctl runs the background
health monitor which makes a few small updates. Also some tests shut
down the pageserver, and now that the background health check can run
some queries while the pageserver is down, that can produce a few
extra errors in the logs, which needed to be allowlisted.

Other changes:
- remove obsolete comments about PostgresNode;
- create standby.signal file for Static compute node;
- log output of `compute_ctl` and `postgres` is merged into
`endpoints/compute.log`.

---------

Co-authored-by: Anastasia Lubennikova <[email protected]>
context: #4414

And improve messages/comments here and there.
sync-safekeepers doesn't know it and sends 0.
The issue was in the wrong `control` file name
Initial logical size calculation could still hinder our fast startup
efforts in #4397. See #4183. In deployment of 2023-06-06
about a 200 initial logical sizes were calculated on hosts which
took the longest to complete initial load (12s).

Implements the three step/tier initialization ordering described in
#4397:
1. load local tenants
2. do initial logical sizes per walreceivers for 10s
3. background tasks

Ordering is controlled by:
- waiting on `utils::completion::Barrier`s on background tasks
- having one attempt for each Timeline to do initial logical size
calculation
- `pageserver/src/bin/pageserver.rs` releasing background jobs after
timeout or completion of initial logical size calculation

The timeout is there just to safeguard in case a legitimate non-broken
timeline initial logical size calculation goes long. The timeout is
configurable, by default 10s, which I think would be fine for production
systems. In the test cases I've been looking at, it seems that these
steps are completed as fast as possible.

Co-authored-by: Christian Schwarz <[email protected]>
@github-actions
Copy link

github-actions bot commented Jun 7, 2023

1000 tests run: 958 passed, 1 failed, 41 skipped (full report)


Failures on Posgres 14

  • test_get_tenant_size_with_multiple_branches: release
# Run failed on Postgres 14 tests locally:
DEFAULT_PG_VERSION=14 scripts/pytest -k "test_get_tenant_size_with_multiple_branches[release-pg14]"
Flaky tests (1)

Postgres 15

  • test_remote_storage_upload_queue_retries[local_fs]: ✅ debug
The comment gets automatically updated with the latest test results
a66a16d at 2023-06-07T14:40:36.304Z :recycle:

@problame
Copy link
Contributor Author

Abandoned in favor of #4441

@problame problame closed this Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.