Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't get keys from leader #1

Open
ramoncisternas opened this issue Jan 29, 2019 · 5 comments
Open

Can't get keys from leader #1

ramoncisternas opened this issue Jan 29, 2019 · 5 comments

Comments

@ramoncisternas
Copy link

Hello Michael,

I have just followed this example of yours to learn something about stateful sets in OpenShit and I found the follower pods can’t get keys from the leader due to a bad hostname construction (I think) compared to what DNS is able to resolve: mehdb-0.axa-partners-chatbot-hogar-preprod-axa-services-es instead of mehdb-0.mehdb.axa-partners-chatbot-hogar-preprod-axa-services-es.svc.cluster.local.

There you have the output of my test:

ramon@bionic-beaver:~/OpenShift $ oc get sts
NAME DESIRED CURRENT AGE
mehdb 3 3 34m

ramon@bionic-beaver:~/OpenShift $ oc scale sts mehdb --replicas=4
statefulset "mehdb" scaled

ramon@bionic-beaver:~/OpenShift $ oc get pods
NAME READY STATUS RESTARTS AGE
mehdb-0 1/1 Running 0 36m
mehdb-1 1/1 Running 0 35m
mehdb-2 1/1 Running 0 33m
mehdb-3 1/1 Running 0 1m

ramon@bionic-beaver:~/OpenShift $ oc logs mehdb-3
2019/01/29 11:49:52 mehdb serving from mehdb-3:9876 using /mehdbdata as the data directory
2019/01/29 11:49:52 I am a follower shard, accepting READS
2019/01/29 11:50:02 Checking for new data from leader
2019/01/29 11:50:02 Can't get keys from leader due to Get http://mehdb-0.axa-partners-chatbot-hogar-preprod-axa-services-es:9876/keys: dial tcp: lookup mehdb-0.axa-partners-chatbot-hogar-preprod-axa-services-es on 10.64.9.9:53: no such host
2019/01/29 11:50:12 Checking for new data from leader
2019/01/29 11:50:12 Can't get keys from leader due to Get http://mehdb-0.axa-partners-chatbot-hogar-preprod-axa-services-es:9876/keys: dial tcp: lookup mehdb-0.axa-partners-chatbot-hogar-preprod-axa-services-es on 10.64.9.9:53: no such host

ramon@bionic-beaver:~/OpenShift $ oc run -i -t --rm dnscheck --restart=Never --image=quay.io/mhausenblas/jump:0.2 -- nslookup mehdb
If you don't see a command prompt, try pressing enter.
Name: mehdb
Address 1: 10.94.107.130 mehdb-2.mehdb.axa-partners-chatbot-hogar-preprod-axa-services-es.svc.cluster.local
Address 2: 10.94.112.4 mehdb-1.mehdb.axa-partners-chatbot-hogar-preprod-axa-services-es.svc.cluster.local
Address 3: 10.94.21.232 mehdb-0.mehdb.axa-partners-chatbot-hogar-preprod-axa-services-es.svc.cluster.local
Address 4: 10.94.9.228 mehdb-3.mehdb.axa-partners-chatbot-hogar-preprod-axa-services-es.svc.cluster.local

I wonder if it would be easy for you to explain the root cause of this error and suggest how it can be fixed.

Thank you in advance,
Ramon Cisternas

@mhausenblas
Copy link
Owner

Thanks for raising this, @ramoncisternas … nothing that immediately comes to mind but could very well be a bug in my code. Will have a look ASAP.

@denismaggior8
Copy link

denismaggior8 commented Sep 25, 2019

Hi Michael, I'm getting almost the same error has reported here above "Can't get keys from leader due to Get http://mehdb-0.default:9876/keys: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)".

I think this is a DNS issue or the way you construct the leader's URL. Since the leader can't be resolved as "mehdb-0.default", but instead works calling it as "mehdb-0.mehdb.default", I think the right way to construct its URL is pod_name.service.namespace.

So I would change this line of code

url := "http://" + leaderShard + "." + ns + ":" + port + "/keys"

to

url := "http://" + leaderShard + "." + "THE SERVICE NAME FROM YAML" + ":" + ns + ":" + port + "/keys"

Could you please verify the code at your end?

Thanks for the job you did so far, it helped me to demo the StatefulSet behaviour (a part from not being able to accept keys from the leader ;-) )

@jeffhoek
Copy link

I have the aforementioned fix in place, ie. in main.go:

-	url := "http://" + leaderShard + "." + ns + ":" + port + "/keys"
+	url := "http://" + leaderShard + "." + "mehdb" + "." + ns + ":" + port + "/keys"

However I'm hitting another issue (open /mehdbdata/test/content: no such file or directory):

oc run -i -t --rm jumpod --restart=Never --image=quay.io/mhausenblas/jump:0.2 -- sh
If you don't see a command prompt, try pressing enter.
~ $ echo "test data" > /tmp/test
~ $ curl -sL -XPUT -T /tmp/test mehdb:9876/set/test
open /mehdbdata/test/content: no such file or directory

This looks like a permission issue:

oc exec -it mehdb-0 -- /bin/bash
bash-4.4$ ls -alh /mehdbdata/
total 12K
drwxr-xr-x. 2   99   99 4.0K Nov 15 05:24 .
drwxr-xr-x. 1 root root 4.0K Nov 15 13:34 ..
bash-4.4$ touch /mehdbdata/test
touch: cannot touch '/mehdbdata/test' Permission denied
bash-4.4$ ls -lZ /mehdbdata/ -d
drwxr-xr-x. 2 99 99 system_u:object_r:nfs_t:s0 4096 Nov 15 05:24 /mehdbdata/
bash-4.4$ id 2
uid=2(daemon) gid=2(daemon) groups=2(daemon)

@mhausenblas
Copy link
Owner

Thanks @jeffhoek! I'm a little unsure what you want me to do? I can't reproduce it.

@jeffhoek
Copy link

jeffhoek commented Nov 18, 2019

I was able to get it running on OpenShift 3.11, with the following steps:

oc create sa mehdb
oc adm policy add-scc-to-user privileged -z mehdb

then in app.yaml add the following to the spec:

      serviceAccountName: mehdb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants