-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Centralized Telegraf Manager #7478
Comments
Hey @raider111111, I appreciate you raising the issue as it mirrors some discussions we've been having on the team. I might like to split off the "monitoring health of agents" conversation to something separate, as Telegraf agents already have an For the config server, we're thinking something like a Telegraf-config service that can store and manage a central config repository, storing config templates for certain types of machines, then maybe the machines would pull the config type they want, using local environment variables for some settings, and start Telegraf.
I'd love to hear your thoughts on what you think and what you currently use for configuration management. |
@ssoroka, how about integration with zookeeper? |
I've been thinking there are two separate pieces: key/value stores for service-discovery/secrets and full config storage. With zookeeper/consul/etcd you may want to pull certain variable and use them in an existing configuration. There is also configuration storage where you store the full configs, plugin configs, or configuration templates. These two sources of information need to be combined in order to produce the final configuration. Right now we have analogues of these two information sources: full config is local only now and can be split in the config-directory. For key/value lookups we have the environment variable support in the config file. I think we would probably keep it as a two step process, this way you can mix and match where the configuration is kept and where the variables are kept. So for zookeeper, may support a way of grabbing variables and using them in the configuration. Maybe something along the lines of: [[variables.zookeeper]]
# zookeeper connection settings
[[inputs.http]]
urls = ["$zookeeper{some_key}"] Not sure how we would do advanced tasks such as producing multiple plugins from a list of keys. I'm not sold on introducing an official template format built-in to Telegraf. It may make more sense to layer this on the outside. |
@raider111111 I would like to share the solution for monitoring the status of agents that we use in our environment. [[inputs.internal]]
[[inputs.procstat]]
systemd_unit = "telegraf.service"
namepass = [ "procstat_lookup" ]
[inputs.procstat.tags]
appl = "telegraf" Thanks to them, each agent marks itself in the procstat_lookup measurement, which reports its status and the host where it is located. Then, we display their status on the Grafana dashboard, where the lack of data is interpreted as a problem with the agent. This is not an ideal solution for a number of reasons, but it works. Hope you find this helpful. |
@danielnelson, didn’t you think about downloading configurations to a directory defined through |
Whats different in the fork? Does it add a new plugin that downloads files and sends SIGHUP? You could definitely do this without a fork using the execd input. Another popular tool for doing this is confd. |
Thanks for the idea, I'll try to do it the way you suggest! It is added that, as far as I know, a fork was made from version 1.6-1.7, when the telegraf did not yet have the execd plugin, probably could not work with environment variables (I could be wrong) and could not load the configuration via http. Then the approach I described seemed appropriate, but now your decision is more correct. |
Thanks @M0rdecay , @danielnelson, and @ssoroka!! |
@raider111111 What we do is use a repo with config setup and an ansible playbook to install the agents to our inventory. We utilize Ansible AWX to maintain the state of all the agents by running the playbook multiple times a day. We provide sane defaults to what should be monitored, but also allow "extras" in our inventory. When a target is configured with an extra - then it is deployed with an additional conf.d fragment. We have several thousand of agents deployed in this manner. With regards to state monitoring, we only care if an agent is not submitting as it should. We have gloabl alerts on Does that help? |
This is a great way to do it, and I would follow @pberlowski 's practice of monitoring |
Closing. See discussion in #272 |
Thanks @ssoroka, @pberlowski! Yes, this is what we're currently doing. |
@raider111111 I can walk you through what we have for the centralized Telegraf management if you want to connect. I'll put a blog post on the topic down on my todo list as well, so I can share this knowledge and start a discussion. |
Ansible is a push model, and I'd be interested in more of a central config repository for Telegraf itself, though as far as individual solutions it's just whatever fits your organization the best. I'm curious if anyone is using Kubernetes to deploy Telegraf configured along side other service pods? |
We're just finalizing an operator for Kubernetes to install "metrics" custom resources for our internal customers. Our main driver is that we're looking to maintain the control over the agent (to allow version and config updates) but we want to shard it across namespaces. |
@pberlowski Did you end up writing up a blog post on this? I wasn't able to track down a blog from your github profile. I would be very much interested in what you were able to get done in that space. Thanks. |
Feature Request
A centralized way to monitor the health of the agents as well as deploy new configuration files to a server or groups of servers remotely.
Opening a feature request kicks off a discussion.
Proposal:
A centralized way to monitor the health of the agents as well as deploy new configuration files to a server or groups of servers remotely.
Current behavior:
Telegraf appears to need to be monitored using a separate tool, like Wavefront and the configurations have to deployed using a desired state tool like Puppet.
Desired behavior:
A centralized way to monitor the health of the agents as well as deploy new configuration files to a server or groups of servers remotely.
Use case:
The ability to monitor the health of the Telegraf agents is important.
The ability to deploy new configuration files to thousands of servers would be extremely useful in our environment.
The text was updated successfully, but these errors were encountered: