-
Notifications
You must be signed in to change notification settings - Fork 264
Documentation: enumerate self-hosted etcd operator failure scenarios #257
Comments
@justinsb what others am I missing? |
That's a good start. A few more off-the-top-of-my-head:
(These are the standard gotchas of self-hosting, but I guess there's a particularly likely scenario for etcd upgrades)
|
There are recovery tools being built in bootkube now: https://github.com/kubernetes-incubator/bootkube#recover-a-downed-cluster |
Quick notes on places to start on writing these docs:
|
I agree it is a good idea to write docs about handling the failure cases for self hosted etcd. The items you listed are good starts! I believe the doc should focus on the difference between self hosted and external etcd, and highlights the potential risks self hosted etcd might introduce and how we solve them. The a lot of items you listed are not really specific to self hosted etcd in my opinion.
If you manually operator etcd, you might have these issues too. They are not introduced by self hosted etcd.
These two are relevant. We will cover it when writing the doc. But I would suggest you to give self hosted etcd a try if you are interested. So we can discuss in more depth. |
@radhikapc is working on etcd docs now. assigning to her, @xiang90 |
Hi @radhikapc, any update on that one? |
@Quentin-M hongchao or I need to write something similar to https://github.com/coreos/etcd/blob/master/Documentation/op-guide/failures.md. Then @radhikapc can start to help cleaning things up. we will get started after finishing up the TLS thing. |
… documentation for etcd operator. ref: coreos#257
… documentation for etcd operator. ref: coreos#257
Where was this moved to? |
Currently we can bringup self-hosted etcd operator setups but we don't document failure or recovery scenarios even though we handle many of them.
cc @jbeda @justinsb @luxas
The text was updated successfully, but these errors were encountered: