Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AdaptiveServerSelection: Update stats for servers that have not responded #9801

Merged
merged 1 commit into from
Nov 15, 2022

Conversation

vvivekiyer
Copy link
Contributor

@vvivekiyer vvivekiyer commented Nov 15, 2022

label = bugfix

When broker routes a query to a server, but the server doesn't respond back within the timeout, the numInFlightRequests in ServerRoutingStats is never decremented. So, the algorithm now thinks that this server has outstanding requests and prefers this server lesser than the others.

This PR fixes this bug. Added tests to confirm that numInFlightRequests is always zero - even if a server doesn't respond to a query.

@siddharthteotia please review.

@vvivekiyer vvivekiyer force-pushed the fixAdssResponseStatsBug branch from 4e23486 to 5d294b5 Compare November 15, 2022 05:37
@vvivekiyer vvivekiyer marked this pull request as ready for review November 15, 2022 05:40
@codecov-commenter
Copy link

codecov-commenter commented Nov 15, 2022

Codecov Report

Merging #9801 (5d294b5) into master (463c120) will decrease coverage by 5.98%.
The diff coverage is 91.66%.

@@             Coverage Diff              @@
##             master    #9801      +/-   ##
============================================
- Coverage     70.15%   64.16%   -5.99%     
+ Complexity     5410     4975     -435     
============================================
  Files          1956     1908      -48     
  Lines        104975   102564    -2411     
  Branches      15892    15597     -295     
============================================
- Hits          73641    65814    -7827     
- Misses        26195    31975    +5780     
+ Partials       5139     4775     -364     
Flag Coverage Δ
integration1 ?
integration2 ?
unittests1 67.82% <91.66%> (+0.04%) ⬆️
unittests2 15.71% <0.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...pache/pinot/core/transport/AsyncQueryResponse.java 74.57% <90.90%> (-17.43%) ⬇️
...a/org/apache/pinot/core/transport/QueryRouter.java 75.00% <100.00%> (-8.34%) ⬇️
...va/org/apache/pinot/common/config/NettyConfig.java 0.00% <0.00%> (-100.00%) ⬇️
...a/org/apache/pinot/common/metrics/MinionMeter.java 0.00% <0.00%> (-100.00%) ⬇️
...g/apache/pinot/common/metrics/ControllerMeter.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/BrokerQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/MinionQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
...ache/pinot/server/access/AccessControlFactory.java 0.00% <0.00%> (-100.00%) ⬇️
...he/pinot/common/messages/SegmentReloadMessage.java 0.00% <0.00%> (-100.00%) ⬇️
...he/pinot/common/messages/TableDeletionMessage.java 0.00% <0.00%> (-100.00%) ⬇️
... and 447 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

ServerResponse response = _responseMap.get(serverRoutingInstance);
response.receiveDataTable(dataTable, responseSize, deserializationTimeMs);

// Record query completion stats immediately after receiving the response from the server instead of waiting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow this change. Was this a bug ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bug. I just consolidated all recordStatsUponResponseArrival into AsyncResponse instead of keeping it scattered all over. It makes sense to be here because if our response handling changes in the future, the author will be reminded to handle it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks for clarifying.

@siddharthteotia siddharthteotia merged commit e307106 into apache:master Nov 15, 2022
@vvivekiyer vvivekiyer deleted the fixAdssResponseStatsBug branch November 15, 2022 07:59
Copy link
Contributor

@jasperjiaguo jasperjiaguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

siddharthteotia pushed a commit that referenced this pull request Nov 15, 2022
vvivekiyer added a commit to vvivekiyer/pinot that referenced this pull request Nov 29, 2022
jackjlli pushed a commit that referenced this pull request Nov 29, 2022
* AdaptiveServerSelection: update response stats for servers that have not responded (#9801)

* Optimize AdaptiveServerSelection for replicaGroup based routing (#9803)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants