Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain node liveness #4475

Closed
awoods187 opened this issue Mar 8, 2019 · 4 comments · Fixed by #6322
Closed

Explain node liveness #4475

awoods187 opened this issue Mar 8, 2019 · 4 comments · Fixed by #6322

Comments

@awoods187
Copy link
Contributor

Explain the causes of node liveness, how to diagnose that its the problem, and what to do as a result.

@awoods187
Copy link
Contributor Author

this needs to go wherever we store frequent problems for people to debug

@jseldess
Copy link
Contributor

@awoods187, can you please provide more details about the specific problem or resources to look at?

@awoods187
Copy link
Contributor Author

Node liveness is a frequent problem users encounter. See:
A customer experienced an outage due to node liveness
Another customer experienced a dead node due to node liveness
General node liveness problems

Node liveness is often a symptom of other problems. We should have a page that people can search for when they get node liveness errors that says that they are symptoms, here are some common causes, and what information they should collect and share with us for support when needed.

@rmloveland
Copy link
Contributor

Related to (but really a subset of) #6319 .

rmloveland added a commit that referenced this issue Jan 10, 2020
Fixes #4475.

Summary of changes:

- Add new 'Node liveness' section to the 'Troubleshoot cluster setup'
  page, including:

  - What it is

  - How the system checks for it

  - Causes of common problems with node liveness (overloaded disk /
    busted network connectivity)
rmloveland added a commit that referenced this issue Jan 22, 2020
Fixes #4475.

Summary of changes:

- Add new 'Node liveness' section to the 'Troubleshoot cluster setup'
  page, including:

  - What it is

  - Common causes of problems with it (overloaded disk / busted network
    connectivity)

  - Several places to check for it in the Admin UI
rmloveland added a commit that referenced this issue Feb 5, 2020
Fixes #4475.

Summary of changes:

- Add new 'Node liveness' section to the 'Troubleshoot cluster setup'
  page, including:

  - What it is

  - Common causes of problems with it (overloaded disk / busted network
    connectivity)

  - Several places to check for it in the Admin UI, including expected
    values for a healthy cluster
rmloveland added a commit that referenced this issue Feb 12, 2020
Fixes #4475.

Summary of changes (applies to 19.2 and 20.1 docs):

- Add new 'Node liveness' section to the 'Troubleshoot cluster setup'
  page, including:

  - What node liveness is

  - Common causes of problems (overloaded disk / busted network
    connectivity)

  - Several places to check for it in the Admin UI, including expected
    values for a healthy cluster
rmloveland added a commit that referenced this issue Feb 12, 2020
Fixes #4475.

Summary of changes (applies to 19.2 and 20.1 docs):

- Add new 'Node liveness' section to the 'Troubleshoot cluster setup'
  page, including:

  - What node liveness is

  - Common causes of problems (overloaded disk / busted network
    connectivity)

  - Several places to check for it in the Admin UI, including expected
    values for a healthy cluster
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants