-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download offline segments from peers #9710
Download offline segments from peers #9710
Conversation
this is awesome. How are you thinking about observability here? If you didn't expect the deep store data to be missing, but servers are just silently downloading segments from one another, how would you know? |
Thanks for pointing it out. Will do. |
pinot-common/src/main/java/org/apache/pinot/common/utils/fetcher/SegmentFetcherFactory.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/utils/fetcher/SegmentFetcherFactory.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/instance/InstanceDataManagerConfig.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
Show resolved
Hide resolved
Done |
8401a48
to
7685439
Compare
rebase and solve conflict. A newbie question, the guide recommend to use |
With github, I generally use If you've ever used something like phabricator, it does a better job of decoupling changes from their actual commits, so even if someone rebases/force pushes, it can show you just the changes that user made. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good. left a few comments. I think you'll need one of the maintainers to approve the tests to run
private void fetchAndDecryptSegmentToLocalInternal(@NonNull List<URI> uris, File dest, String crypterName) | ||
throws Exception { | ||
Preconditions.checkArgument(!uris.isEmpty(), "empty uris passed into the fetchAndDecryptSegmentToLocalInternal"); | ||
URI uri = uris.get(RANDOM.nextInt(uris.size())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are all the peer URIs, and you're randomly selecting one? If so, this feels like it should just live in the calling function to this. No need to duplicate the fetch
functions here just to get a random selector in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am considering whether we should introduces the strategy pattern here: user can either choose to pick up a random or go over all peers. The trade of between responsiveness and data reliability can be determined by user. For the sake of time, I hide the random implementation inside this function. In the future, I think the function signature would be
private void fetchAndDecryptSegmentToLocalInternal(@NonNull List<URI> uris, File dest, String crypterName, PeerDownloadderStrategy strategy);
interface PeerDownloaderStrategy {
Response download(List<URI> uris, File dest, Context);
}
class Context {
String crypterName;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this PR, I won't integrate the strategy pattern and directly use the random pick strategy by default. What do you think? @jadami10
Thanks. Totally agree with you I use github before joining uber. I am meant to use |
pinot-core/src/test/java/org/apache/pinot/core/data/manager/BaseTableDataManagerTest.java
Outdated
Show resolved
Hide resolved
pinot-core/src/test/java/org/apache/pinot/core/data/manager/BaseTableDataManagerTest.java
Outdated
Show resolved
Hide resolved
pinot-core/src/test/java/org/apache/pinot/core/data/manager/BaseTableDataManagerTest.java
Outdated
Show resolved
Hide resolved
pinot-core/src/test/java/org/apache/pinot/core/data/manager/BaseTableDataManagerTest.java
Outdated
Show resolved
Hide resolved
pinot-core/src/test/java/org/apache/pinot/core/data/manager/BaseTableDataManagerTest.java
Outdated
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## master #9710 +/- ##
=============================================
+ Coverage 34.69% 68.70% +34.00%
- Complexity 190 4995 +4805
=============================================
Files 1965 1965
Lines 105115 105164 +49
Branches 15909 15914 +5
=============================================
+ Hits 36474 72251 +35777
+ Misses 65542 27793 -37749
- Partials 3099 5120 +2021
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@chenboat I haven't integrated the instance level config into realtime table in this approach. Can I skip it for this PR even though it brings obscurity into the rule of determining the value of final peer download scheme. Check the PR description for details on the rule. |
Design doc
Justification
Download offline segments from peers if the remote copies disappeared. #9709
Backward Compatibility
Add a new instance level config
pinot.server.peer.download.scheme
IF table type is reatlime, final peer download scheme = table config's
peerSegmentDownloadScheme
, which is an existing feature.ELIF table type is offline and streaming download is disabled, final peer download schem = table config's
peerSegmentDownloadScheme
!= null ? table config'speerSegmentDownloadScheme
: instancel levelpinot.server.peer.download.scheme
ELSE peer downloading is disabled.
Implementation
Get peers' segment URI from ZK. Randomly pick up one and do downloading.
Manual Test
Before this change, delete an offline segment’s HDFS copy and replace an offline server node holding a replica of that segment. We can see that the associated segment replica goes to the error state from Helix external view. After rolling out this feature, setting up either the instance level or table level peerDownloadingScheme brings back the missing replica.