[postgresql-ha] delete pods in ha break cluster #2805

borgez · 2020-06-10T14:02:20Z

Which chart:
postgresql-ha:3.2.9

Describe the bug
postgresql-0, postgresql-1, postgresql-2 cannot start agan because repmgr passwords change when pods recreate

To Reproduce

create with default config
wait to full up Workloads
hard delete all pods from deployment (like update chart)
wait for pods recreate and up
pods cannot start because passwords change what?

Expected behavior

pods up, cluster resync
values in Secrets not change

marcosbc · 2020-06-11T11:42:20Z

Hi @borgez, when you mention this:

hard delete all pods from deployment (like update chart)

Do you mean running a command such as "helm update"?

Note that if you do that, secrets are regenerated and if you had a persistent volume, there will be a mismatch between the current secrets and configured secrets, meaning the authentication fails.

To fix that, you have a couple of options:

Specify existingSecrets with the name of the previous -working- secrets (i.e. postgresql.existingSecrets/pgpool.existingSecrets/ldap.existingSecrets).
Specify the password in each update of the chart (i.e. postgresql.password/postgresql.repmgrPassword/ldap.bindPassword/pgpool.adminPassword).

IMPORTANT: These options need to be specified both in the creation (helm install) and update (helm update) of the chart. If you don't have the values for the previous deployment, either delete the existing PVC and lose any previous data (the simpler option) or reset the password to match the existing secret (hard). Finally, re-crete the deployment specifying the secrets/password.

borgez · 2020-06-11T14:04:42Z

Do you mean running a command such as "helm update"?

I tried to break the cluster in different ways to check its fault tolerance.
I think I did similar thing to "helm update"

Note that if you do that, secrets are regenerated and if you had a persistent volume, there will be a mismatch between the current secrets and configured secrets, meaning the authentication fails.

it break cluster

Specify existingSecrets with the name of the previous -working- secrets (i.e. postgresql.existingSecrets/pgpool.existingSecrets/ldap.existingSecrets).

I not found this in docs

Specify the password in each update of the chart (i.e. postgresql.password/postgresql.repmgrPassword/ldap.bindPassword/pgpool.adminPassword).

it fix for me, yes

But this is not the expected behavior.
In docs i found update section and i expect it will be new password, not exist, this not clear why is there such default behavior.

We have secrets and I expect that when updating nothing will break and existing secrets wil be reuse as default.
When i want change passwords i expect i pass somting like --new-password="1234" or update it manually by cmd

borgez · 2020-06-11T14:24:33Z

Or may be helm param global.rollPasswords=true, but not by default

borgez · 2020-06-11T14:38:57Z

And i think if chart force roll password it need to notify payload about it and change passwords in payload postgresql, repmgr etc, it break kubernetes philosophy

joancafom · 2020-06-15T07:59:45Z

Hi @borgez

By default, if no password is provided by the end user, the Chart will generate and use a random one instead. When performing an upgrade (and not specifying a password) this behaviour is preserved. By providing with the same credentials (or an existingSecret containing them), this random stage is skipped. As aforementioned, this is specially important when dealing with persistent volumes, as the new release may change the secrets but in no manner will it be able to alter previously persisted configuration.

borgez · 2020-06-15T09:52:08Z

Hi @joancafom

Yes, if secrets alredy exists random stage should be skipped but it not...

borgez · 2020-06-15T10:10:15Z

I record screencast https://nimb.ws/MkaKpZ

joancafom · 2020-06-16T08:07:33Z

Hi @borgez

I will try to exemplify my words by running an example. Let's say I want to deploy a new release of my chart using helm:

$ helm install tauro bitnami/postgresql-ha                                                                                                                                                                 
NAME: tauro
LAST DEPLOYED: Tue Jun 16 09:43:02 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
** Please be patient while the chart is being deployed **
...

To get the password for "postgres" run:

    export POSTGRES_PASSWORD=$(kubectl get secret --namespace default tauro-postgresql-ha-postgresql -o jsonpath="{.data.postgresql-password}" | base64 --decode)

To get the password for "repmgr" run:

    export REPMGR_PASSWORD=$(kubectl get secret --namespace default tauro-postgresql-ha-postgresql -o jsonpath="{.data.repmgr-password}" | base64 --decode)
...

Once I give some reasonable amount of time for the pods to go up, I am successfully able to see them all running:

$ kubectl get pods                                                                                                                                                                                             
NAME                                          READY   STATUS    RESTARTS   AGE
tauro-postgresql-ha-pgpool-5c48ff85f8-hkdwd   1/1     Running   0          3m52s
tauro-postgresql-ha-postgresql-0              1/1     Running   0          3m52s
tauro-postgresql-ha-postgresql-1              1/1     Running   0          3m44s

This release also came along with some secrets using that were populated using random passwords, as I did not specify any on installation time:

$ kubectl get secrets                                                                                                                                                                                            
NAME                             TYPE                                  DATA   AGE
tauro-postgresql-ha-pgpool       Opaque                                1      14m
tauro-postgresql-ha-postgresql   Opaque                                2      14m

Now imagine that I want to update the number of replicas, be it one more than current's count. In order to preserve the same passwords and prevent the chart from failing on the update, I must provide the old passwords. We can create two new variables containing them, using the instructions provided on the first command:

$ export POSTGRES_PASSWORD=$(kubectl get secret --namespace default tauro-postgresql-ha-postgresql -o jsonpath="{.data.postgresql-password}" | base64 --decode)
$ export REPMGR_PASSWORD=$(kubectl get secret --namespace default tauro-postgresql-ha-postgresql -o jsonpath="{.data.repmgr-password}" | base64 --decode)

I can safely perform the update now:

$ helm upgrade tauro --set postgresql.password=$POSTGRES_PASSWORD \
--set postgresql.repmgrPassword=$REPMGR_PASSWORD \
--set postgresql.replicaCount=3 \
bitnami/postgresql-ha

$ kubectl get pods                                                                                                                                                                                               
NAME                                          READY   STATUS    RESTARTS   AGE
tauro-postgresql-ha-pgpool-64b79f5659-l869k   1/1     Running   0          94s
tauro-postgresql-ha-postgresql-0              1/1     Running   1          32s
tauro-postgresql-ha-postgresql-1              1/1     Running   0          62s
tauro-postgresql-ha-postgresql-2              1/1     Running   2          94s

If the passwords weren't provided in the upgrade stage, the Chart will create new pairs of randomly generated credentials, leading to a failure. We are currently considering how to improve this behaviour.

I hope this running example helps, thanks!

stale · 2020-07-01T13:22:27Z

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

borgez · 2020-07-01T13:30:38Z

@joancafom hi, you have plan to fix it?

joancafom · 2020-07-02T14:02:17Z

Hi @borgez

We are evaluating different approaches regarding how to skip this complementary information provision when performing an upgrade, but I am afraid I can't give you an ETA. On the meantime, README.md contains a section explaining this is necessary and how to overcome the problem.

Thanks again!

stale · 2020-07-17T23:52:25Z

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

joancafom · 2020-07-20T10:26:57Z

Hi again,

We are currently making improvements to prevent this issue from happening. You can check this PR #3150 for further details 😄, but we plan to log some errors when some required fields were not provided in the upgrade.

Thanks for your issue!

borgez · 2020-07-20T11:25:28Z

@joancafom Great, thank you!

stale · 2020-08-06T08:40:27Z

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

tombh · 2020-08-09T16:22:21Z

I just ran into this problem too. I've manually set global.postgresql.database.repmgrPassword now so it should be ok. It's not clear from PR #3150 whether this is fixed yet? Or indeed if that PR is just providing better error reporting?

joancafom · 2020-08-11T07:57:50Z

Hi @tombh

We are still implementing changes regarding this (check PR #3335 out). Once they are done, any upgrade release not providing the correct parameters will block and report the issue to the end user, thus preventing this situation from happening.

Thanks

stale · 2020-08-27T01:15:56Z

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

rafariossaa · 2020-08-27T09:53:34Z

Hi ,
Sorry for the delay, the PR #3335 was merged.
Could you give this a try and check if this issue keeps happening to you ?

tombh · 2020-08-27T09:56:47Z

Thank you ❤️️ I'll test on our next blue/green cluster swap, but might not be for a while :/

stale · 2020-09-11T10:24:01Z

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

rafariossaa · 2020-09-14T13:43:02Z

I will left this as on-hold waiting for confirmation.

carrodher · 2022-10-20T13:18:57Z

Unfortunately, this issue was created a long time ago and although there is an internal task to fix it, it was not prioritized as something to address in the short/mid term. It's not a technical reason but something related to the capacity since we're a small team.

Being said that, contributions via PRs are more than welcome in both repositories (containers and charts). Just in case you would like to contribute.

During this time, there are several releases of this asset and it's possible the issue has gone as part of other changes. If that's not the case and you are still experiencing this issue, please feel free to reopen it and we will re-evaluate it.

borgez mentioned this issue Jun 10, 2020

$ in password break install chart #2771

Closed

stale bot added the stale 15 days without activity label Jul 1, 2020

stale bot removed the stale 15 days without activity label Jul 1, 2020

stale bot added the stale 15 days without activity label Jul 17, 2020

stale bot removed the stale 15 days without activity label Jul 20, 2020

joancafom mentioned this issue Jul 22, 2020

[rabbitmq] Don't Regenerate the Password #3094

Closed

stale bot added the stale 15 days without activity label Aug 6, 2020

stale bot removed the stale 15 days without activity label Aug 9, 2020

stale bot added the stale 15 days without activity label Aug 27, 2020

stale bot removed the stale 15 days without activity label Aug 27, 2020

stale bot added the stale 15 days without activity label Sep 11, 2020

rafariossaa added the on-hold Issues or Pull Requests with this label will never be considered stale label Sep 14, 2020

stale bot removed the stale 15 days without activity label Sep 14, 2020

carrodher closed this as completed Oct 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[postgresql-ha] delete pods in ha break cluster #2805

[postgresql-ha] delete pods in ha break cluster #2805

borgez commented Jun 10, 2020

marcosbc commented Jun 11, 2020

borgez commented Jun 11, 2020 •

edited

Loading

borgez commented Jun 11, 2020

borgez commented Jun 11, 2020

joancafom commented Jun 15, 2020

borgez commented Jun 15, 2020

borgez commented Jun 15, 2020

joancafom commented Jun 16, 2020

stale bot commented Jul 1, 2020

borgez commented Jul 1, 2020

joancafom commented Jul 2, 2020

stale bot commented Jul 17, 2020

joancafom commented Jul 20, 2020 •

edited

Loading

borgez commented Jul 20, 2020

stale bot commented Aug 6, 2020

tombh commented Aug 9, 2020

joancafom commented Aug 11, 2020

stale bot commented Aug 27, 2020

rafariossaa commented Aug 27, 2020

tombh commented Aug 27, 2020

stale bot commented Sep 11, 2020

rafariossaa commented Sep 14, 2020

carrodher commented Oct 20, 2022

[postgresql-ha] delete pods in ha break cluster #2805

[postgresql-ha] delete pods in ha break cluster #2805

Comments

borgez commented Jun 10, 2020

marcosbc commented Jun 11, 2020

borgez commented Jun 11, 2020 • edited Loading

borgez commented Jun 11, 2020

borgez commented Jun 11, 2020

joancafom commented Jun 15, 2020

borgez commented Jun 15, 2020

borgez commented Jun 15, 2020

joancafom commented Jun 16, 2020

stale bot commented Jul 1, 2020

borgez commented Jul 1, 2020

joancafom commented Jul 2, 2020

stale bot commented Jul 17, 2020

joancafom commented Jul 20, 2020 • edited Loading

borgez commented Jul 20, 2020

stale bot commented Aug 6, 2020

tombh commented Aug 9, 2020

joancafom commented Aug 11, 2020

stale bot commented Aug 27, 2020

rafariossaa commented Aug 27, 2020

tombh commented Aug 27, 2020

stale bot commented Sep 11, 2020

rafariossaa commented Sep 14, 2020

carrodher commented Oct 20, 2022

borgez commented Jun 11, 2020 •

edited

Loading

joancafom commented Jul 20, 2020 •

edited

Loading