-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Imprecise duration logged by MasterService #7849
Comments
I think that's not quite right. The time value will only have a precision of the |
See issue #2000 and the linked PR for a similar problem caused by cached time. The linked PR has some performance data. There is also a microbenchmark for measuring the performance of System.nanoTime and System.currentTimeMillis |
Thanks @andrross for sharing this issue. Nice discussion around different time captures. |
@shwetathareja Do you have any thoughts here? |
@andrross To add to your argument, cluster state update and associated listener are not too frequent. Not expecting too many of these calls within a second, and errors in order of 2-5 ms should be acceptable here which is far more than expected inaccuracy. |
@andrross / @sandeshkr419 I agree during cluster state processing, we need more accurate duration using system clock and not cached time. The cluster state changes are not frequent and we need more accurate breakup around which listener is taking how much time or calculating new state itself is taking longer. This would help identify bottlenecks in the cluster state updates code path. |
For identifying bottlenecks I would prefer JMH benchmarks or profiling rather than debug logs if a +/- 200ms changes the course of action. |
@Bukhtawar I think we can have a guidance when to use system clocks vs cached times. JMH benchmarks are good but we don't know all the scenarios upfront. And, having these logs help debug when an issue appears in production and all of a sudden cluster state processing and applying starts taking really long. These logs were added with the same intention in the first place. |
@shwetathareja @Bukhtawar I tend to agree. Also, I think the cat is already out of the bag here in that we have a mix of cached time vs system clock:
|
@Bukhtawar / @shwetathareja / @andrross - I am wondering if there is any downside to reducing the That also avoids unnecessary invocations to System.nanoTime like in above PR |
@jainankitk I honestly don't know. See this comment for a test that showed doing |
@jainankitk I have a doubt regarding using cached time. Does setting a cached time clock to 1ms - does this implies that the cached time clock refreshes/invokes the clock every 1 ms - if yes - then using a cached clock with 1ms latency can cause regression? Given that the code in scope of this issue is not aggressive in invocation frequency. Probably, if there are other places which require a lower cached time - then we can modify this code piece to utilize the new 1ms cached time as well. But introducing cached time for this scope does not seems reasonable. |
@jainankitk This would be one thread per threadpool right? We can benchmark to see the impact. |
Describe the bug
Execution Time computations in MasterService.java reports 0s when the computed time is less than 200ms. This is because the time computations use
threadPool.relativeTimeInMillis()
which is useful when we require time precision in seconds. These cluster state computations, publications or of the order ofms
.To Reproduce
Steps to reproduce the behavior:
Expected behavior
Following Log lines should appear:
The above times are not useful, since most of these operations execute in less than 200ms. A more granular time calculation can help fine tuning the performance better and offer more insights when debug logs are enabled.
OpenSearch Data Configuration:
Suggestive Solution
Suggesting to use
System.currentTimeMillis()
instead of using cached time from ThreadPool. Although there can be other classes where this change might be useful, for that we can start from here and then incrementally change them as we encounter them since enabling debug logs and checking each API results might not be feasible.Plugins
None enabled, not required.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
Additional context
I was trying to optimize ClusterState creation activity (ref: #7002) when I encountered this.
The text was updated successfully, but these errors were encountered: