Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report attach failures in pageserver API #4344

Closed
Tracked by #886
LizardWizzard opened this issue May 25, 2023 · 4 comments · Fixed by #4371
Closed
Tracked by #886

Report attach failures in pageserver API #4344

LizardWizzard opened this issue May 25, 2023 · 4 comments · Fixed by #4371
Assignees

Comments

@LizardWizzard
Copy link
Contributor

LizardWizzard commented May 25, 2023

Based on conversation here: https://neondb.slack.com/archives/C033A2WE6BZ/p1684854161739799

TLDR: we need to return Failed status instead of Attached if tenant is in broken state

@LizardWizzard LizardWizzard self-assigned this May 25, 2023
@problame
Copy link
Contributor

Cross-posting @koivunej 's comment on this from PR review #4321 (would be good to base this work on top of that PR)

#4321 (comment)

So right now "not found from s3" tenant on attach will yield broken and is assumed to stay broken, as in we will not get rid of the tenant in memory? Confirmed we don't.

For me the answer seems to be quite simple; just forget everything about the tenant if we never had any possibility of attaching it? Problem will probably be then crash-safety, because we will have the files on disk. It seems even the attach marker was left, so until a manual detach comes we can't make progress even across restarts. This is most likely current design.

I am unsure if we need to focus on solving the "tried to attach tenant from different region" issue.

@koivunej
Copy link
Member

Note this is already causing rare flakyness on #4321 so I am thinking we cannot yet release this: #4321 (comment).

@koivunej
Copy link
Member

Solution in #4321 ended up as d6d0c74 which was:

  • use the attachment_status of the state from which we transitioned to Activating (loading or attaching)

@LizardWizzard
Copy link
Contributor Author

I'll pick this up once #4321 is merged

LizardWizzard added a commit that referenced this issue Jun 7, 2023
## Problem

Attach failures are not reported in public part of the api (in
`attachment_status` field of TenantInfo).

## Summary of changes

Expose TenantState::Broken as TenantAttachmentStatus::Failed

In the way its written Failed status will be reported even if no
attachment happened. (I e if tenant become broken on startup). This is
in line with other members. I e Active will be resolved to Attached even
if no actual attach took place.

This can be tweaked if needed. At the current stage it would be overengineering without clear motivation

resolves #4344
awestover pushed a commit that referenced this issue Jun 14, 2023
## Problem

Attach failures are not reported in public part of the api (in
`attachment_status` field of TenantInfo).

## Summary of changes

Expose TenantState::Broken as TenantAttachmentStatus::Failed

In the way its written Failed status will be reported even if no
attachment happened. (I e if tenant become broken on startup). This is
in line with other members. I e Active will be resolved to Attached even
if no actual attach took place.

This can be tweaked if needed. At the current stage it would be overengineering without clear motivation

resolves #4344
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants