Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes internal cert only valid for ip 10.0.0.1 #399

Closed
Vlatombe opened this issue May 29, 2018 · 21 comments
Closed

Kubernetes internal cert only valid for ip 10.0.0.1 #399

Vlatombe opened this issue May 29, 2018 · 21 comments

Comments

@Vlatombe
Copy link

I recreated an AKS cluster recently (k8s 1.9.6) and I'm experiencing failures accessing the kubernetes endpoint through kubernetes.default.svc from inside a pod.

Hostname kubernetes.default.svc not verified:
    certificate: sha256/+VQ6mcPU2cYTD1eo8qsXkyNPrPYhz+Ju2p37panRNK0=
    DN: CN=client, O=system:masters
    subjectAltNames: [10.0.0.1]

This used to work before so I assume something has changed.

@tomconte
Copy link
Member

Might be related to Azure/acs-engine#2656 ? ("Adding parameter to specify more master SANs")

@Vlatombe
Copy link
Author

Looks related, although the unit tests in this PR look fine (checking for the expected standard hostnames)

@Vlatombe
Copy link
Author

Although I think the API should always be accessible from standard host names kubernetes.default.svc, kubernetes.default.svc.cluster.local, the kubectl command line is able to resolve the api server using the injected environment variables KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT (respectively set to 10.0.0.1 and 443).

I'm using https://github.com/fabric8io/kubernetes-client to access the kubernetes api and it currently doesn't have any lookup mecanism based on environment variables and defaults to kubernetes.default.svc.

Filed a PR to fix it fabric8io/kubernetes-client#1086

@FeetInAncientTime
Copy link

This problem may also affect jobs submitted to Apache Spark on AKS:

...
io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:228)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:184)
	at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.<init>(KubernetesClusterSchedulerBackend.scala:70)
	at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:120)
	at 
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2741)
	... 8 more
Caused by: javax.net.ssl.SSLPeerUnverifiedException: Hostname kubernetes.default.svc not verified:
    certificate: sha256/<REDACTED>=
    DN: CN=client, O=system:masters
    subjectAltNames: [10.0.0.1]
...

@ams0
Copy link

ams0 commented Jun 6, 2018

Hitting it too on 1.9.6 trying to deploy zalando/zelenium chart.

@rdesiano
Copy link

rdesiano commented Jun 7, 2018

I'm also hitting this on 1.9.6 and 1.8.11 when using the Jenkins k8s plugin and it tries to query the kube API.

@tomconte
Copy link
Member

tomconte commented Jun 7, 2018

Just tried this on a fresh AKS cluster built yesterday (westeurope).

Running from inside a pod...

Extract certificate: echo | openssl s_client -showcerts -connect kubernetes.default.svc:443 2>&1 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p'

(save it to cert.pem)

Check subject alt names: openssl x509 -text -noout -in cert.pem | grep DNS

Result:

DNS:hcp-kubernetes, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, DNS:hcp-kubernetes.5b17d1e6eb90d9000143b715.svc.cluster.local, DNS:cap-cap-252281-432c9c63.hcp.westeurope.azmk8s.io, IP Address:10.0.0.1

I think this looks OK? Is it worth re-testing on a new cluster?

@Vlatombe
Copy link
Author

Vlatombe commented Jun 7, 2018

Hi @tomconte,

looks good, I'll rebuild my test cluster to verify.

@Vlatombe
Copy link
Author

Vlatombe commented Jun 7, 2018

On eastus, still doesn't work. It only has the IP Address 10.0.0.1

@Vlatombe
Copy link
Author

Vlatombe commented Jun 7, 2018

@tomconte I confirm it works on westeurope region. eastus still broken

@Vlatombe
Copy link
Author

Vlatombe commented Jun 7, 2018

@tomconte Looks like it is not so simple. I rebuilt several clusters in westeurope today, and I'm up to the point where the certificate is correct in some cases, but not in some others

@bramvdklinkenberg
Copy link

Like @ams0 I ran into this issue when deploying the zalenium helm chart on AKS.
Adding the below environment variabel to the chart fixed it for me.

@rdesiano
Copy link

rdesiano commented Jun 7, 2018

I've provisioned three clusters today in US East, 1.7.16, 1.8.10, and 1.9.6 and was able to verify that all of them had correctly configured certs via:
root@jenkins-59486b7cf7-mb6ww:/# echo | openssl s_client -showcerts -connect kubernetes.default.svc:443 2>&1 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > cert.pem

and then:

root@jenkins-59486b7cf7-mb6ww:/# openssl x509 -text -noout -in cert.pem | grep DNS DNS:hcp-kubernetes, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, DNS:hcp-kubernetes.5b197b31ccc65a0001be1578.svc.cluster.local, DNS:pillars-sandbox-1-9-6-247e0b08.hcp.eastus.azmk8s.io, IP Address:10.0.0.1 root@jenkins-59486b7cf7-mb6ww:/#

It's worrisome that this issue is so unpredicatable but I guess for now I can't replicate.

@tomconte
Copy link
Member

tomconte commented Jun 8, 2018

The issue seems to be intermittent. We are still investigating.

@weinong
Copy link
Contributor

weinong commented Jun 8, 2018

We identified the bug. This impacts AKS clusters with newer infrastructure feature. We will update here once the rollout is completed.
Thanks for reporting it!

@bs-matil
Copy link

@weinong any idea of the timeline of that fix?

@carota24
Copy link

Having the same issue, any timeline?

@bs-matil
Copy link

@carota24
if this is possible for you-> if you just create a new cluster (we did about 30) you will eventually get one with a valid certificate.

@tomconte
Copy link
Member

AFAICT the fix should be deployed this week, maybe @rite2nikhil can confirm?

@Vlatombe
Copy link
Author

The issue is now fixed for me with a new AKS cluster.

@web-apply
Copy link

for ip address 10.0.0.1 guide yo can look here = https://10-0-0-1.net/

@ghost ghost locked as resolved and limited conversation to collaborators Aug 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants