-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
self deployment / pull mode #320
Comments
For the pull mode what do you think about: 1 - Create the etcd cluster in a way that it can be scaled, we need to discuss about what would be the proper way : static or discovery. 2 - Use the etcd cluster to store the inventory and use a dynamic inventory. 3 - A big issue is the secrets management, where to store the certs/tokens?. how to sync between the nodes?, do we have to create a cert by node? ... 4 - Use ansible pull-mode |
For public cloud consumers, etcd discovery is probably optimal since it almost never results in a broken cluster. For anyone deploying in-house, they might be reluctant to use discovery. An initial cluster array is adequate already. Dynamic inventory and deploying etcd via ansible creates a chicken and egg problem. You can't use an inventory from etcd until etcd is up. Also, you need to make a way to populate this etcd. I would vote against adding complexity just for the sake of finding an innovative way to consume etcd. Secrets management is a topic I've dealt with in previous projects. We currently have 1 master host which knows all the information. If you want to move to client-pull mode, all clients need to know where host(s) are located that know the secrets. Secret file storage should be replicated and transmitted using an encrypted method (ansible's SSH/rsync transport is totally fine). I think you should add a new role for secrets and the first alphabetical node actually generates the secrets, while the others take a fully copy. All other nodes only take the secrets as-needed. It's important to ensure that scale-up/scale-down scenarios are covered. |
Thank you mattymo for your answer. We can let the user choose the way he wants to deploy the etcd cluster. I understand that etcd would become a strong dependency but when for instance a new node is added he needs to know about the cluster topology (where is the api, the etcd, ...). Regarding the secrets,
This is the reason why we need an inventory |
i'm probably missing something, but why not consider DNS discovery with SRV records vs. etcd discovery? |
This is one of the discovery option that offers etcd and i'm actually considering it @v1k0d3n |
@RustyRobot , i need your input here too :) |
it's always been the easiest for me when building and tearing down etcd clusters for kubernetes during testing (granted, i've been pulled away from doing this in recent months so some of the syntax may have changed with etcd2/3). i just created srv records on my dns server:
and then configure the etcd cluster for dns discovery (example for kubetcd01)...
|
I would not use ansible pull if possible. Also about the all in one image I ll detail more later
|
@ant31 yes, please do :) |
@v1k0d3n how would you delete or add members ? |
i would let the users control that on the DNS side, and use proxy on the etcd members. destroy, and/or rebuild...add via dns. i mean, it's RAFT...so 3 or 5 members is ideal. how many members do you really want over that? my biggest stumbling block right now with kargo is I have this great srv/dns framework in place that i can't use to bring up etcd. :( |
@Smana @v1k0d3n What I know about current state of etcd, is there is no other way to manage it except of having a static list of etcd members and synchronizing it with a etcd cluster by explicitly calling There is a good video on life cycle management of etcd from CoreOS fest which took place in Berlin. Basically the presenter had to reinvent new tool on top of etcd, in order to do proper cluster management, it's not a trivial task, and I would suggest to go with static list as the most simple and straight forward solution, until something like that is supported by etcd natively. |
-> the all-in one is optional and that's an another subject The only requirement on host would be docker: docker run -e options=... -v /:/rootfs/ --rm kargo-deploy -- init The image kargo-deploy, contains ansible + kargo scripts, we mount the host volume into the container, with privileged access we can configure it.
To configure the container_engine, I propose to keep and use current playbooks. Maybe later we can switch to shell-script instead of ansible to remove the 'python' dependency from hosts. |
i think i'm losing track of what's being discussed in this thread, which is why I started #324 @Smana. giving users an option for how they want to bootstrap etcd distances ourselves away from which method is better and why. in my use case; i'm very specifically looking for a DNS SRV bootstrap discovery method for etcd, and i like the approach of "bring your own [xyz component]" to the project. if users are tied to hard dependancies like ansible-pull, kpm built-in, etc, or if the project becomes less democratic and more opinionated about the etcd bootstrap method i feel like the target audience will become more narrow over time. |
Discussion have deviated on etcd, maybe we should open a new issue to solve this question of etcd. This issue is to how to switch kargo from push to pull.
That's the opposite of what we trying to solve with this issue. This is why using ansible-pull is out! I don't want to install ansible on every hosts. |
@ant31 i agree with you for the docker image which deploys the node where it resides. |
The |
Maybe I'm missing something, but why not stand up an etcd cluster with discovery and store secrets in etcd? |
@v1k0d3n yes, good idea to use the etcd cluster to store the secrets and configuration shared by nodes/masters |
|
private repo? |
@v1k0d3n Sorry, i've changed my mind and i closed the repo, i'll try to do a PR instead. |
hello! i'm curious if this is still in the works? i'd be interested in contributing :) |
@billyoung we thinking about something, but no real work is done as i know. |
/lifecycle stale |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Configure the local node.
There will probably be some caveats like the certs/tokens management
The text was updated successfully, but these errors were encountered: