-
Notifications
You must be signed in to change notification settings - Fork 827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics data loss in K8S controller #3607
Comments
This sounds like works as intended.
See #2478 for context. |
In my specific scenario, the controller metrics has already malfunctioned before the restart.
|
That's a good point - will have to attempt to replicate 🤔 |
In our use case, Agones is configured with 10 fleets, and each fleet has a fleet autoscaler enabled. Additionally, 10 separate gameservers have been configured, which are not managed by the fleets. Hope this can help you successfully reproduce the issue. Thank you for your hard work. |
Hello,markmandel, I hope this message finds you well. I wanted to follow up on the issue. Furthermore, I understand that replicating the issue can sometimes be challenging, and I'm wondering if there's any additional information or assistance I can provide to facilitate the process. If any specific scenarios, logs, or system configurations would be helpful, please let me know. I’m also willing to assist with testing or any other tasks that might help you address the issue more efficiently. Looking forward to your guidance on how I can best support your efforts. Thank you for your time and attention to this matter. Best regards |
Sorry this isn't currently at the top of my priority queue, so haven't had a chance to look at it. Would definitely be happy to provide pointers if you wanted to dig into it? |
Sorry this isn't currently at the top of my priority queue, so haven't had a chance to look at it. Would definitely be happy to provide pointers if you wanted to dig into it?
Thank you for your prompt response. I completely understand that this
issue may not be your top priority at the moment. I appreciate your
willingness to provide pointers for further investigation.
If there's a more suitable time for you to delve into this matter or
if you have any initial thoughts to share, I would be grateful for any
guidance you can provide.
Looking forward to your insights.
|
If you would like to go digging (and i encourage it!), all these metrics are managed here: https://github.com/googleforgames/agones/tree/main/pkg/metrics Feel free to drop questions here, or in #development channel on our Slack! |
In Agones version 1.35.0, disabling the FeatureGate: "ResetMetricsOnDelete" can resolve issues with metrics anomalies. Through an in-depth analysis of the source code, I've discovered that this feature can lead to certain memory optimization benefits. However, it also results in an increase in code complexity. Notably, during this optimization process, there seems to be a bug within the code that causes anomalies in the metrics indicators. Based on these findings, I will attempt to fix this issue and provide a pull request (PR) if everything goes smoothly. |
Thanks for digging in! |
* fix: #3607 Metrics data loss in K8S controller * add unit test for #3607 Co-authored-by: Zach Loafman <[email protected]> Co-authored-by: Mark Mandel <[email protected]>
What happened:
After restarting the K8S controller, the "agones_gameservers_total" metric is no longer being collected for the "shipping-mode1-map1-3568" battle server. However, the "agones_gameservers_count" metric is still being collected.
What you expected to happen:
I expected both the "agones_gameservers_total" and "agones_gameservers_count" metrics to continue being collected consistently, even after the controller restart.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
Agones version: 1.35.0
Kubernetes version (use
kubectl version
):Client Version: v1.27.2
Kustomize Version: v5.0.1
Server Version: v1.22.5-tke.19
Cloud provider or hardware configuration:
Install method (yaml/helm): helm
Troubleshooting guide log(s):
Others:
The text was updated successfully, but these errors were encountered: