Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SegRep bug where primary shards do not update tracked replica state post failover/relocation #11017

Merged
merged 1 commit into from
Oct 31, 2023

Conversation

mch2
Copy link
Member

@mch2 mch2 commented Oct 31, 2023

Description

This change fixes a bug that can occur with SegRep where primary shard does not update its local tracking of replica ReplicationCheckpoint state. This can occur post failover or relocation where a new primary publishes a checkpoint that is swallowed by replicas that are already up to date. Unless a new write is made to the index, the lag will grow indefinitely.

Related Issues

Resolves #11016

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…te post failover/relocation

Signed-off-by: Marc Handalian <[email protected]>
@github-actions
Copy link
Contributor

Compatibility status:

Checks if related components are compatible with change 8930681

Incompatible components

Incompatible components: [https://github.com/opensearch-project/performance-analyzer.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git]

@mch2 mch2 added the backport 2.x Backport to 2.x branch label Oct 31, 2023
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testReadRangeBlobWithRetries

@codecov
Copy link

codecov bot commented Oct 31, 2023

Codecov Report

Merging #11017 (8930681) into main (63aff16) will increase coverage by 0.05%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##               main   #11017      +/-   ##
============================================
+ Coverage     71.29%   71.34%   +0.05%     
- Complexity    58742    58824      +82     
============================================
  Files          4872     4872              
  Lines        276777   276780       +3     
  Branches      40240    40241       +1     
============================================
+ Hits         197316   197468     +152     
+ Misses        62943    62851      -92     
+ Partials      16518    16461      -57     
Files Coverage Δ
...s/replication/SegmentReplicationTargetService.java 56.17% <100.00%> (+0.56%) ⬆️

... and 486 files with indirect coverage changes

@kotwanikunal kotwanikunal merged commit a2febe9 into opensearch-project:main Oct 31, 2023
@mch2 mch2 deleted the bug branch October 31, 2023 08:21
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 31, 2023
…te post failover/relocation (#11017)

Signed-off-by: Marc Handalian <[email protected]>
(cherry picked from commit a2febe9)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
dreamer-89 pushed a commit that referenced this pull request Oct 31, 2023
…te post failover/relocation (#11017) (#11018)

(cherry picked from commit a2febe9)

Signed-off-by: Marc Handalian <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…te post failover/relocation (opensearch-project#11017)

Signed-off-by: Marc Handalian <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch bug Something isn't working skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Segment Replication - Replication lag not updated post failover
2 participants