Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split thread cpu time into three metrics #7724

Merged
merged 2 commits into from
Nov 18, 2021

Conversation

mqliang
Copy link
Contributor

@mqliang mqliang commented Nov 8, 2021

Description

This PR split the ThreadCpuTimeNs metric into 3 parts:

  • threadCpuTimeNs
  • systemActivitiesCpuTimeNs
  • responseSerializationCpuTimeNs

Splitting will help us understand the time spent on the 3 part separately.

screenshoot of offline table:
offline-screenshoot

screenshoot of realtime table:
realtime-screenshoot

screenshoot of hybrid table:
hybrid-screenshoot

cc @siddharthteotia @mcvsubbu @gshara

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)

  • Yes (Please label as backward-incompat, and complete the section below on Release Notes)

Does this PR fix a zero-downtime upgrade introduced earlier?

  • Yes (Please label this as backward-incompat, and complete the section below on Release Notes)

Does this PR otherwise need attention when creating release notes? Things to consider:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed
  • Yes (Please label this PR as release-notes and complete the section on Release Notes)

Release Notes

Documentation

@codecov-commenter
Copy link

codecov-commenter commented Nov 8, 2021

Codecov Report

Merging #7724 (11de7d0) into master (af01aa5) will increase coverage by 0.99%.
The diff coverage is 69.33%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #7724      +/-   ##
============================================
+ Coverage     64.23%   65.22%   +0.99%     
- Complexity     3985     4084      +99     
============================================
  Files          1568     1533      -35     
  Lines         80194    78888    -1306     
  Branches      11929    11792     -137     
============================================
- Hits          51510    51456      -54     
+ Misses        25022    23774    -1248     
+ Partials       3662     3658       -4     
Flag Coverage Δ
integration1 ?
integration2 ?
unittests1 68.69% <85.96%> (+0.05%) ⬆️
unittests2 14.59% <4.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...roker/requesthandler/BaseBrokerRequestHandler.java 21.64% <0.00%> (-42.20%) ⬇️
...org/apache/pinot/broker/api/RequestStatistics.java 37.75% <25.00%> (-27.11%) ⬇️
...che/pinot/core/query/scheduler/QueryScheduler.java 68.75% <53.84%> (-15.02%) ⬇️
...e/pinot/core/query/reduce/BrokerReduceService.java 73.41% <70.58%> (-21.63%) ⬇️
...a/org/apache/pinot/common/metrics/BrokerTimer.java 89.47% <100.00%> (-2.84%) ⬇️
...a/org/apache/pinot/common/metrics/ServerTimer.java 84.61% <100.00%> (-5.39%) ⬇️
...t/common/response/broker/BrokerResponseNative.java 92.48% <100.00%> (+0.73%) ⬆️
.../java/org/apache/pinot/common/utils/DataTable.java 89.47% <100.00%> (+0.58%) ⬆️
...e/pinot/core/common/datatable/DataTableImplV3.java 96.06% <100.00%> (+0.04%) ⬆️
.../pinot/core/operator/InstanceResponseOperator.java 86.84% <100.00%> (+2.97%) ⬆️
... and 554 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update af01aa5...11de7d0. Read the comment docs.

@mqliang mqliang closed this Nov 9, 2021
@mqliang mqliang reopened this Nov 9, 2021
@mqliang mqliang closed this Nov 9, 2021
@mqliang mqliang reopened this Nov 9, 2021
@mqliang mqliang closed this Nov 9, 2021
@mqliang mqliang reopened this Nov 9, 2021
@mqliang mqliang closed this Nov 9, 2021
@mqliang mqliang reopened this Nov 9, 2021
@mqliang mqliang closed this Nov 10, 2021
@mqliang mqliang reopened this Nov 10, 2021
@mqliang mqliang closed this Nov 10, 2021
@mqliang mqliang reopened this Nov 10, 2021
@mqliang mqliang closed this Nov 11, 2021
@mqliang mqliang reopened this Nov 11, 2021
@mqliang mqliang closed this Nov 12, 2021
@mqliang mqliang reopened this Nov 12, 2021
@GSharayu
Copy link
Contributor

LGTM thanks

@mqliang mqliang force-pushed the split-metric branch 2 times, most recently from b6165d2 to 114a2d7 Compare November 16, 2021 23:44
@mqliang
Copy link
Contributor Author

mqliang commented Nov 16, 2021

PR updated:

  • add a total_cpu_time log/metric which is the sum of the split metric.
  • fix a bug of systemActivtiesCpuTimeNs calculation

screenshoot of offline table:

offline-screenshoot

screenshoot of realtime table:
realtime-screenshoot

screenshoot of hybrid table:
hybrid-screenshoot

@mqliang mqliang closed this Nov 16, 2021
@mqliang mqliang reopened this Nov 16, 2021
@mqliang mqliang closed this Nov 17, 2021
@mqliang mqliang reopened this Nov 17, 2021
LOGGER.info(
"Broker Id: {}, timeout: {}ms, query response limit: {}, query log length: {}, query log max rate: {}qps",
_brokerId, _brokerTimeoutMs, _queryResponseLimit, _queryLogLength, _queryLogRateLimiter.getRate());
LOGGER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please check if you import the latest Pinot Style? There are lots of reformatting changes and seems they don't follow the Pinot Style: https://docs.pinot.apache.org/developers/developers-and-contributors/code-setup#setup-ide

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jackie-Jiang I just double checked:

  • delete the Pinot Style from my Intellj->Editor -> Code Style -> Java
  • import codestyle-intellij.xml from pinot/config again.
  • reformat the file using the new Pinot Style

Nothing changed. Which means the reformatting in the PR followed Pinot Style, while the old file not.

Could you please do me a favor by importing the latest Pinot Style and perform Reformat code against this file using your IntellJ (using the current master branch)?

  • If your file does not change, then maybe my IntellJ have some issues, I will revert those reformatting change
  • If your file also get changed, let's keep those reformatting change in the PR.

Does it sounds good?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually tried that before leaving the comment and the current master file didn't change.
In the Editor -> Code Style -> Java setting, under Wrapping and Braces -> Keep when reformatting, can you uncheck the Line breaks and try reformatting again? Current setting preserves the line break when reformatting because in certain cases we want to break the line in a different way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Thanks!

Copy link
Contributor

@siddharthteotia siddharthteotia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Minor comments to fix the argument name

* Set the total cpu time(thread cpu time + system activities cpu time + response serialization cpu time) used
* against offline table in request handling, into the broker response.
*/
void setOfflineTotalCpuTimeNs(long realtimeResponseSerializationCpuTimeNs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argument name is incorrect

* Set the total cpu time(thread cpu time + system activities cpu time + response serialization cpu time) used
* against realtime table in request handling, into the broker response.
*/
void setRealtimeTotalCpuTimeNs(long realtimeResponseSerializationCpuTimeNs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@siddharthteotia siddharthteotia merged commit 9662399 into apache:master Nov 18, 2021
@mqliang mqliang deleted the split-metric branch November 18, 2021 20:42
@kishoreg
Copy link
Member

have we measured the impact of measuring these numbers?

@mqliang
Copy link
Contributor Author

mqliang commented Nov 19, 2021

@kishoreg

have we measured the impact of measuring these numbers?

We benchmarked, JVM have the stats for each thread, this measuring just retrieve the number from JVM, so the overhead is negligible.

In addition, the measuring is disabled by default:

public static final boolean DEFAULT_ENABLE_THREAD_CPU_TIME_MEASUREMENT = false;

@richardstartin
Copy link
Member

JVM have the stats for each thread, this measuring just retrieve the number from JVM, so the overhead is negligible.

The JVM gets the data from a clock - it delegates to clock_gettime(CLOCK_THREAD_CPUTIME_ID). Whether this call is fast or slow depends on whether the clock CLOCK_THREAD_CPUTIME_ID is in vDSO or not (which prevents a syscall) and this is platform dependent. On some platforms each call can take hundreds of nanoseconds or more, so it’s important to measure this for each deployment environment before enabling thread CPU time.

@mqliang
Copy link
Contributor Author

mqliang commented Nov 22, 2021

@richardstartin

On some platforms each call can take hundreds of nanoseconds or more, so it’s important to measure this for each deployment environment before enabling thread CPU time.

What do you suggest? How about let's add a simple benchmark in pinot-perf. And it's user's responsibility to run the benchmark on their platform and decide if enable the thread timer.

kriti-sc pushed a commit to kriti-sc/incubator-pinot that referenced this pull request Dec 12, 2021
* Split thread CPU time into three metrics

* fix typo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants