[BUG] Imprecise duration logged by MasterService #7849

sandeshkr419 · 2023-05-31T19:07:52Z

Describe the bug
Execution Time computations in MasterService.java reports 0s when the computed time is less than 200ms. This is because the time computations use threadPool.relativeTimeInMillis() which is useful when we require time precision in seconds. These cluster state computations, publications or of the order of ms.

To Reproduce
Steps to reproduce the behavior:

Enable Debug Logs for Master Service
Trigger any operation which recreates ClusterState and publishes it
Expected behavior
Following Log lines should appear:

[2023-05-31T11:38:34,142][DEBUG][o.o.c.s.MasterService    ] [master1] took [0s] to compute cluster state update for [index-aliases]
[2023-05-31T11:38:34,307][DEBUG][o.o.c.s.MasterService    ] [master1] took [0s] to notify listeners on successful publication of cluster state (version: 55109, uuid: Fc58ns6mRHCNuxcvADvfOg) for [index-aliases]
[2023-05-31T11:38:32,416][DEBUG][o.o.c.s.MasterService    ] [master1] took [0s] to compute cluster state update for [cluster_reroute(reroute after starting shards)]
[2023-05-31T11:38:32,416][DEBUG][o.o.c.s.MasterService    ] [master1] took [0s] to notify listeners on unchanged cluster state for [cluster_reroute(reroute after starting shards)]
[2023-05-31T11:38:34,307][DEBUG][o.o.c.s.MasterService    ] [master1] took [0s] to notify listeners on successful publication of cluster state (version: 55109, uuid: Fc58ns6mRHCNuxcvADvfOg) for [index-aliases]

The above times are not useful, since most of these operations execute in less than 200ms. A more granular time calculation can help fine tuning the performance better and offer more insights when debug logs are enabled.

OpenSearch Data Configuration:

45k aliases, 5 indices. (multiple indices to bombard the cluster-state processing time)
In some of the operations, actual cluster state time was published when it was more than 200ms.

Suggestive Solution
Suggesting to use System.currentTimeMillis() instead of using cached time from ThreadPool. Although there can be other classes where this change might be useful, for that we can start from here and then incrementally change them as we encounter them since enabling debug logs and checking each API results might not be feasible.

Plugins
None enabled, not required.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

OS: MacOS (M1 Pro processor)
Version: 3.0 / main branch
Local Cluster with 1 master, 1 data node

Additional context
I was trying to optimize ClusterState creation activity (ref: #7002) when I encountered this.

The text was updated successfully, but these errors were encountered:

andrross · 2023-05-31T21:43:46Z

actual cluster state time was published when it was more than 200ms

I think that's not quite right. The time value will only have a precision of the thread_pool.estimated_time_interval value, which defaults to 200ms, right? I think this is by design to avoid the overhead of the call to System.nanoTime(). This optimization is important when computing durations in hot path code, such as search or index operations. The question here is whether it is appropriate to use the cached time here or just always use the system clock. @sandeshkr419 Can you make the case that using the cached time here is not necessary?

andrross · 2023-05-31T22:25:23Z

See issue #2000 and the linked PR for a similar problem caused by cached time. The linked PR has some performance data. There is also a microbenchmark for measuring the performance of System.nanoTime and System.currentTimeMillis

sandeshkr419 · 2023-06-01T06:27:05Z

Thanks @andrross for sharing this issue. Nice discussion around different time captures.
Seems like System.currentTimeMillis can give erroneous results, so will have to think over if we can leverage System.nanoTime without any performance regressions, since existing debug logging for the above particular scenario is not at all informative in present state.

andrross · 2023-06-01T19:42:48Z

will have to think over if we can leverage System.nanoTime without any performance regressions

System.nanoTime is not dirt cheap, but it does only take on the order of tens of nanoseconds. For things that happen many thousands of times per second or more in hot path code, this can add up. For things that happen less frequently, ~30 nanoseconds can be negligible. My intuition is that things like updating cluster state and notifying listeners are probably not super hot path code and we can probably calculate an accurate duration by using the real chip clock instead of using the cached time.

@shwetathareja Do you have any thoughts here?

sandeshkr419 · 2023-06-01T20:25:33Z

@andrross To add to your argument, cluster state update and associated listener are not too frequent. Not expecting too many of these calls within a second, and errors in order of 2-5 ms should be acceptable here which is far more than expected inaccuracy.

shwetathareja · 2023-06-02T06:32:39Z

@andrross / @sandeshkr419 I agree during cluster state processing, we need more accurate duration using system clock and not cached time. The cluster state changes are not frequent and we need more accurate breakup around which listener is taking how much time or calculating new state itself is taking longer. This would help identify bottlenecks in the cluster state updates code path.

Bukhtawar · 2023-06-06T07:13:12Z

This would help identify bottlenecks in the cluster state updates code path.

For identifying bottlenecks I would prefer JMH benchmarks or profiling rather than debug logs if a +/- 200ms changes the course of action.
My concern is we start following this model, it would really get hard to ensure other parts of the code use a fine balance between when to and when not to use cached time.

shwetathareja · 2023-06-07T11:44:48Z

@Bukhtawar I think we can have a guidance when to use system clocks vs cached times. JMH benchmarks are good but we don't know all the scenarios upfront. And, having these logs help debug when an issue appears in production and all of a sudden cluster state processing and applying starts taking really long. These logs were added with the same intention in the first place.

andrross · 2023-06-07T15:43:57Z

I think we can have a guidance when to use system clocks vs cached times.

@shwetathareja @Bukhtawar I tend to agree. Also, I think the cat is already out of the bag here in that we have a mix of cached time vs system clock:

% git grep 'System.nanoTime' server/src/main/ | wc -l
     161

% git grep 'relativeTimeInMillis' server/src/main | wc -l
      64

jainankitk · 2023-07-24T23:45:08Z

@Bukhtawar / @shwetathareja / @andrross - I am wondering if there is any downside to reducing the thread_pool.estimated_time_interval to lower value like 5ms or even 1ms and use the cached time at all the places. If there is single thread doing this every ms (ms precision should suffice for all our use cases), would this cause any performance impact?

That also avoids unnecessary invocations to System.nanoTime like in above PR

andrross · 2023-07-25T00:48:21Z

@jainankitk I honestly don't know. See this comment for a test that showed doing System.nanoTime once a millisecond in a loop took about 6% of one CPU. Do you suspect that this commit is causing any sort of performance impact?

sandeshkr419 · 2023-07-27T18:56:20Z

@jainankitk I have a doubt regarding using cached time. Does setting a cached time clock to 1ms - does this implies that the cached time clock refreshes/invokes the clock every 1 ms - if yes - then using a cached clock with 1ms latency can cause regression? Given that the code in scope of this issue is not aggressive in invocation frequency. Probably, if there are other places which require a lower cached time - then we can modify this code piece to utilize the new 1ms cached time as well. But introducing cached time for this scope does not seems reasonable.

shwetathareja · 2023-08-04T03:58:18Z

If there is single thread doing this every ms (ms precision should suffice for all our use cases), would this cause any performance impact?

@jainankitk This would be one thread per threadpool right? We can benchmark to see the impact.

sandeshkr419 added bug Something isn't working untriaged labels May 31, 2023

andrross changed the title ~~[BUG]~~ [BUG] Imprecise duration logged by MasterService May 31, 2023

andrross added the distributed framework label May 31, 2023

shwetathareja assigned sandeshkr419 Jun 2, 2023

shwetathareja removed the untriaged label Jun 2, 2023

sandeshkr419 mentioned this issue Jun 2, 2023

[Fix] Replace cached time with system clock in MasterService debug logs #7902

Merged

6 tasks

andrross closed this as completed in #7902 Jun 29, 2023

gbbafna mentioned this issue Jul 31, 2023

[Remote Store] Add support for Remote Translog Store upload stats in _nodes/stats/ API #8908

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Imprecise duration logged by MasterService #7849

[BUG] Imprecise duration logged by MasterService #7849

sandeshkr419 commented May 31, 2023 •

edited

Loading

andrross commented May 31, 2023 •

edited

Loading

andrross commented May 31, 2023

sandeshkr419 commented Jun 1, 2023 •

edited

Loading

andrross commented Jun 1, 2023

sandeshkr419 commented Jun 1, 2023

shwetathareja commented Jun 2, 2023

Bukhtawar commented Jun 6, 2023

shwetathareja commented Jun 7, 2023 •

edited

Loading

andrross commented Jun 7, 2023

jainankitk commented Jul 24, 2023

andrross commented Jul 25, 2023

sandeshkr419 commented Jul 27, 2023

shwetathareja commented Aug 4, 2023

[BUG] Imprecise duration logged by MasterService #7849

[BUG] Imprecise duration logged by MasterService #7849

Comments

sandeshkr419 commented May 31, 2023 • edited Loading

andrross commented May 31, 2023 • edited Loading

andrross commented May 31, 2023

sandeshkr419 commented Jun 1, 2023 • edited Loading

andrross commented Jun 1, 2023

sandeshkr419 commented Jun 1, 2023

shwetathareja commented Jun 2, 2023

Bukhtawar commented Jun 6, 2023

shwetathareja commented Jun 7, 2023 • edited Loading

andrross commented Jun 7, 2023

jainankitk commented Jul 24, 2023

andrross commented Jul 25, 2023

sandeshkr419 commented Jul 27, 2023

shwetathareja commented Aug 4, 2023

sandeshkr419 commented May 31, 2023 •

edited

Loading

andrross commented May 31, 2023 •

edited

Loading

sandeshkr419 commented Jun 1, 2023 •

edited

Loading

shwetathareja commented Jun 7, 2023 •

edited

Loading