Skip to content
This repository has been archived by the owner on Mar 27, 2021. It is now read-only.

Implement new Bigtable timeout & retry settings #733

Merged
merged 41 commits into from
Mar 4, 2021

Conversation

sming
Copy link
Contributor

@sming sming commented Dec 18, 2020

Use Case Resolved: Heroic pipeline instability and poor user experience for Grafana

  • I am a Heroic developer
  • who wants to armour up Heroic so that we get fewer alerts
  • so that we can focus on features

Plan

  • look at the spotify-bigtable-client repo for "inspiration"
  • don't implement a whole new layer of configuration. Focus on the 2 or 3 settings we're immediately interested in.

Notes for Reviewer

  • only read requests are affected (i.e. Heroic API but not metric Ingestion)
  • a maxMutateRowTimeout is implemented but will be deployed with the default setting value so as not to change Ingestion at all. This is done for development efficiency reasons i.e. it would have been foolish not to add support for it whilst I was changing the exact same files in the exact same places. Here is the heroic-gke PR that puts mutateRpcTimeoutMs back to 10 minutes.

@sming sming changed the title Implement new Bigtable timeout & retry settings [DRAFT] Implement new Bigtable timeout & retry settings Dec 19, 2020
Peter Kingswell added 2 commits December 19, 2020 19:28
- passing all tests
- next is to refactor the 5 parameters into a class

Signed-off-by: Peter Kingswell <[email protected]>
Signed-off-by: Peter Kingswell <[email protected]>
Copy link
Contributor

@malish8632 malish8632 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Peter Kingswell added 2 commits December 21, 2020 15:44
@codecov
Copy link

codecov bot commented Dec 22, 2020

Codecov Report

Merging #733 (e380555) into master (2e63167) will increase coverage by 0.10%.
The diff coverage is 89.13%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #733      +/-   ##
============================================
+ Coverage     55.09%   55.20%   +0.10%     
- Complexity     3163     3175      +12     
============================================
  Files           749      751       +2     
  Lines         20427    20544     +117     
  Branches       1341     1343       +2     
============================================
+ Hits          11254    11341      +87     
- Misses         8684     8716      +32     
+ Partials        489      487       -2     
Impacted Files Coverage Δ Complexity Δ
...m/spotify/heroic/http/tracing/OpenCensusUtils.java 0.00% <ø> (ø) 0.00 <0.00> (ø)
...c/test/AbstractMetadataBackendIndexResourceIT.java 94.47% <0.00%> (ø) 29.00 <0.00> (ø)
...ic/analytics/bigtable/BigtableAnalyticsModule.java 65.07% <0.00%> (ø) 9.00 <0.00> (ø)
...heroic/suggest/elasticsearch/SuggestBackendKV.java 82.22% <ø> (+0.74%) 79.00 <0.00> (+2.00)
...com/spotify/heroic/metric/MetricManagerModule.java 71.31% <23.80%> (-2.80%) 18.00 <0.00> (ø)
...m/spotify/heroic/metric/consts/ApiQueryConsts.java 66.66% <66.66%> (ø) 1.00 <1.00> (?)
...spotify/heroic/metric/MetricsConnectionSettings.kt 69.23% <69.23%> (ø) 2.00 <2.00> (?)
...potify/heroic/metric/bigtable/BigtableBackend.java 85.76% <83.33%> (-0.06%) 61.00 <1.00> (ø)
...y/heroic/metric/bigtable/BigtableMetricModule.java 81.51% <87.50%> (-4.35%) 6.00 <0.00> (-1.00)
.../spotify/heroic/test/AbstractSuggestBackendIT.java 99.14% <98.60%> (-0.36%) 53.00 <30.00> (+6.00) ⬇️
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2e63167...e380555. Read the comment docs.

@sming sming changed the title [DRAFT] Implement new Bigtable timeout & retry settings Implement new Bigtable timeout & retry settings Dec 22, 2020
@sming sming marked this pull request as ready for review December 22, 2020 20:37
malish8632
malish8632 previously approved these changes Dec 22, 2020
Copy link
Contributor

@malish8632 malish8632 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

adsail
adsail previously approved these changes Dec 22, 2020
Peter Kingswell added 7 commits December 22, 2020 16:32
and localhost IT test runs where it works locally but not in CI.

Signed-off-by: Peter Kingswell <[email protected]>
Signed-off-by: Peter Kingswell <[email protected]>
Signed-off-by: Peter Kingswell <[email protected]>
@sming sming linked an issue Dec 27, 2020 that may be closed by this pull request
Peter Kingswell added 2 commits December 28, 2020 12:30
down the road all tests of this kind (timer-based) will
need redoing as they're lame and brittle.

Signed-off-by: Peter Kingswell <[email protected]>
Peter Kingswell added 5 commits January 29, 2021 13:20
to prevent the intermittent timeouts that are being observed.
…tings-refactored' into feature/add-bigtable-timeout-settings-refactored
build intermittently failing on com.spotify.heroic.AbstractClusterQueryIT
distributedFilterQueryTest  - investigating.
@sming sming dismissed stale reviews from adsail and malish8632 via 5b8da86 February 8, 2021 16:52
Peter Kingswell added 5 commits February 12, 2021 12:48
- occurred when a timeout happened
- full message: java.lang.IllegalStateException: Could not find policy
'pick_first'. Make sure its implementation is either registered to
LoadBalancerRegistry or included in
META-INF/services/io.grpc.LoadBalancerProvider from your jar files.
@adsail adsail self-requested a review March 1, 2021 21:06
settings.
tidied up docs too
@adsail adsail self-requested a review March 2, 2021 18:11
adsail
adsail previously approved these changes Mar 2, 2021
malish8632
malish8632 previously approved these changes Mar 3, 2021
@sming sming dismissed stale reviews from malish8632 and adsail via 69f6fad March 3, 2021 20:02
@adsail adsail self-requested a review March 4, 2021 18:51
@sming sming merged commit c6b5a6f into master Mar 4, 2021
@delete-merged-branch delete-merged-branch bot deleted the feature/add-bigtable-timeout-settings branch March 4, 2021 19:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement new Bigtable timeout settings
4 participants