-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Harvest 22.08 is making many queries to internal DNS #1353
Comments
hi @faguayot let me make sure I understand the problem. You're saying that Harvest is making too many DNS requests, in the range of 2-3K DNS requests every two to three minutes? Do you know if those requests are causing a problem or you're trying to understand why those requests are being made? Or is the concern that Harvest is making that many requests? Harvest uses ZAPI or REST protocols to gather metrics from ONTAP, typically by talking to the cluster management lif. With the out-of-the-box templates, a single Harvest poller will make, per-object concurrent, requests to the cluster for each object listed in the collector's
In cases where there are many ONTAP objects, e.g. say there are 50 thousand qtrees, ONTAP won't return all of them in a single request, and instead, Harvest requests them 500 at a time, which means there would be 100 requests to gather all the qtrees. In other words, the number of requests Harvest sends will be a function of the number of objects being monitored since we request them in chunks. Perhaps the spikes you're seeing are when the schedules for multiple objects overlap? From a DNS perspective, this shouldn't be a problem though. Are these concurrent requests causing errors? In terms of DNS, Harvest isn't doing anything DNS related. Harvest only talks HTTPs to ONTAP. The OS will make DNS lookups when those HTTP requests contain hostname instead of IPs, but all of that happens further down the stack than Harvest. Some Questions
|
Hello @cgrinds, The DNS requests for the moment aren't impacting the DNS in terms of performance or availability but it's something that could happen.
The IPs addresses which the DNS requests have to resolve are the following (this is an example of a cluster only): As I said before, these are internal IPs which doesn't have a name resolution.
|
Thanks for the details and log files @faguayot. We don't see a problem in Harvest that would cause higher than expected DNS queries. So far, it appears these requests are a consequence of Harvest sending REST & ZAPI requests to ONTAP. I'm going to see if we can get permission to wireshark one of our large clusters. I pulled out some counts from you log file (see table below). These stand out because of the high instance and/or metric count.
|
Hello @cgrinds, Thanks for your checks and the detailed information shared.
The result was that the DNS queries disappeared. So I think that you have bounded from where come the problem we are having. Regarding the Workload objects we don't disable because it is something that we are using for sometime and we don't believe that was the problem. For give you more information, the log shared with you is from a storage array which is based on NFS. |
Thanks for the We're confident you have ~39,943 locks, which means it will take Harvest around 80 ZAPI requests to return them all. And while it's only taking 5s to do that, it would not be surprising, if those 80 ZAPI requests became multiple DNS requests when ONTAP requests lock information. Now that you've narrowed it down to It could be that when ONTAP queries the active network connections, it needs to do DNS queries to find/validate the connections? In particular, when it tries to return the IP address for remote hosts, connected clients, and client IP connected to each interface. Understood on the Workloads and yes, those have been there since day one and have not changed much so unlikely it's related. |
Good morning @cgrinds, We disabled in a first step the: Zapi:NFSLock and the result was no queries to the DNS so it seems this object wasn't the problem. Today we want to continue disabling the others in different moment at time. When we have the results with those, I will share with you. My suspicion is that possibly data collection of Rest:NFSClients is the problem, as you said when this checks network active connections the ONTAP execute the name resolution but I can't understand why the ONTAP does that with the internal IPs addresses those which only are use by cluster. |
@cgrinds Today we made the tests with the other two objects and we discover that the object which was generating many queries to DNS was |
Thanks for the confirmation @faguayot! That means you will see the same "DNS storm" from the ONTAP CLI since Harvest's REST template for I don't think there is anything Harvest can do about this. It's probably worth opening a case with ONTAP if you want clarification or would like them to reduce the number of DNS requests. |
Sorry for the delay in answering. I didn't know which query to the API or ZAPI was doing that object, so thanks for share that helpful information again Chris. What kind of request/query is doing the Thanks for your time during this issue which wasn't directly a problem of the harvest although harvest was indirectly involved. |
The Since this template is not used by any dashboards we're going to:
Thanks to Alessandro for reporting and LeonardoA for providing the details on See also: |
Describe the bug
The harvest 22.08 seems to be making a huge number of queries to our internal DNS. In fact, the IP's checked mostly are those for the Vserver "Cluster", those IP's are internal for the Cluster. Every 2-3 minutes there are like two peak of requests reaching the 1,5K-2,5K in seconds. We have tried to stop the harvest instances that we have and the high demand stopped.
Environment
Provide accurate information about the environment to help us reproduce the issue.
bin/harvest start --config=foo.yml --collectors Zapi
]Expected behavior
Not do any DNS request from harvest, at least for the internal interfaces.
Actual behavior
![image](https://user-images.githubusercontent.com/85484449/196471969-fca24a84-dce0-4c3c-982d-c0b4433d4176.png)
The line in red are the requested for the storage arrays.
Same test but with the harvest instances stopped.
![image](https://user-images.githubusercontent.com/85484449/196473374-9dfb9d72-e34d-40e9-84e4-bec451e7e431.png)
Additional context
I would like to find which collector, object is making constantly those queries for comment that and avoiding any problem for the DNS service. I don't understand why something is making those requests for IP addresses which are internal, only for every cluster.
The text was updated successfully, but these errors were encountered: