Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory #7969

Merged
merged 12 commits into from
Jan 14, 2022

Conversation

klsince
Copy link
Contributor

@klsince klsince commented Dec 30, 2021

This PR tries to refactor the segment loading logic in BaseTableDataManager to rely on SegmentDirectory interface instead of always assuming a concrete local segment directory, to make the logic more extensible, allowing the processed segment to be kept on local disk or remote fs transparently.

For that, we have added a copyTo() method in SegmentDirectory abstract class, and use it to decouple the segment loading logic from where the processed segment is kept physically. Because the segment reloading logic is already conducted on the copy of index directory in order to handle reloading failure, the new copyTo() method fits into the existing logic flow well.

Description

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)

  • Yes (Please label as backward-incompat, and complete the section below on Release Notes)

Does this PR fix a zero-downtime upgrade introduced earlier?

  • Yes (Please label this as backward-incompat, and complete the section below on Release Notes)

Does this PR otherwise need attention when creating release notes? Things to consider:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed
  • Yes (Please label this PR as release-notes and complete the section on Release Notes)

Release Notes

Documentation

@codecov-commenter
Copy link

codecov-commenter commented Dec 30, 2021

Codecov Report

Merging #7969 (ad6d167) into master (823aa07) will decrease coverage by 0.04%.
The diff coverage is 75.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #7969      +/-   ##
============================================
- Coverage     71.35%   71.31%   -0.05%     
- Complexity     4218     4221       +3     
============================================
  Files          1595     1596       +1     
  Lines         82748    82874     +126     
  Branches      12348    12365      +17     
============================================
+ Hits          59046    59102      +56     
- Misses        19715    19766      +51     
- Partials       3987     4006      +19     
Flag Coverage Δ
integration1 28.97% <27.90%> (-0.10%) ⬇️
integration2 27.63% <27.32%> (+0.07%) ⬆️
unittests1 68.13% <78.06%> (-0.05%) ⬇️
unittests2 14.33% <0.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ent/index/column/PhysicalColumnIndexContainer.java 90.83% <0.00%> (-0.84%) ⬇️
...ache/pinot/segment/spi/store/SegmentDirectory.java 55.55% <0.00%> (-6.95%) ⬇️
...egment/spi/index/metadata/SegmentMetadataImpl.java 74.28% <28.57%> (-0.29%) ⬇️
...server/starter/helix/HelixInstanceDataManager.java 79.44% <47.05%> (-2.64%) ⬇️
...indexsegment/immutable/ImmutableSegmentLoader.java 84.15% <73.52%> (-7.51%) ⬇️
.../pinot/core/data/manager/BaseTableDataManager.java 87.44% <80.24%> (+2.67%) ⬆️
...ocal/segment/index/loader/SegmentPreProcessor.java 80.00% <83.33%> (+0.21%) ⬆️
...nt/local/loader/DefaultSegmentDirectoryLoader.java 100.00% <100.00%> (ø)
.../columnminmaxvalue/ColumnMinMaxValueGenerator.java 73.68% <100.00%> (ø)
...loader/defaultcolumn/BaseDefaultColumnHandler.java 63.26% <100.00%> (ø)
... and 42 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 823aa07...ad6d167. Read the comment docs.

@klsince klsince force-pushed the decouple_segment_load_with_local_file branch from 95f97a6 to f76a05c Compare January 4, 2022 00:17
Copy link
Contributor

@npawar npawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@klsince
Copy link
Contributor Author

klsince commented Jan 4, 2022

Thanks for the comments @npawar @jackjlli.

cc @siddharthteotia @Jackie-Jiang to help take a look as well, as you helped review the previous PR, a prep for this one. Thank you!

Copy link
Contributor

@mcvsubbu mcvsubbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing. Please hold merging


public static void createBackup(File indexDir)
throws IOException {
if (!indexDir.exists()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has this check been added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replied below

Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
"Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
// Copy the backup dir back to proceed.
FileUtils.copyDirectory(segmentBackupDir, indexDir);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to rename the current directory and reload. Can you clarify why we have changed it to copy the segment instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renameTo happens at L172. The logic to create backup dir is not changed but moved to this util method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that the rename happens the same way, but prior to this PR, we just downloaded the segment. Looks like we are now copying after the rename (and then perhaps overwriting what we copied?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's on purpose to rename and then copy back, so that the subsequent segment preprocessing happens on the copy dir w/o affecting the ongoing queries, which if any are accessing the renamed dir which is kept intact even if segment reloading fails on the copy dir. As context, renameTo is an atomic operation.

prior to this PR, we just downloaded the segment

Previously, downloading doesn't always happens. It happens if segment CRC changes or forceDownload option is set to true, when one requests to reload segment (so downloading raw segment is relatively rare). In this PR, I made copy always happens, and if downloading happens, it overwrites the copy dir.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change has been reverted. Just that copying backup directory back is done with SegmentDirectory.copyTo() now to decouple the logic with local segment directory.

indexDir = downloadSegment(segmentName, zkMetadata);
boolean shouldDownloadRawSegment = forceDownload || !hasSameCRC(zkMetadata, segmentMetadata);
if (shouldDownloadRawSegment && allowDownloadRawSegment(segmentName, zkMetadata)) {
downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata, schema);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify why we download from a "tier" (I assume this to mean a peer storage server that already hosts the segment), as opposed to always downloading from the master and keep things simple?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A server's tier backend is not peer server or deep store.

The refactoring in this PR would allow a server's tier backend to be local disk (the default impl) or remote fs (like s3). The segment from server's tier backend is not the raw segment from deep store or peer server, but the fully preprocessed segment ready to serve queries or to apply incremental changes like adding index.

When server uses remote tier backend, the File indexDir does not exist for the backup/restore flow, so I added the existence check at L166 ^

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I referred to the design document https://docs.google.com/document/d/1Z4FLg3ezHpqvc6zhy0jR6Wi2OL8wLO_lRC6aLkskFgs/edit# and pinot documentation https://docs.pinot.apache.org/operators/operating-pinot/tiered-storage

I could not locate the meaning of a "tier backend". In order that the code be maintainable, please provide some clarification

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the pointers. think I should make it clear that it means: a tier of servers with local or remote backend

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcvsubbu tierBackend concept was already introduced last year (just updated the design doc to reflect it, sorry for not doing it when the change was added). Explanation regarding the new field added to the design too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean to have a "local" vs "non-local" tieredBackend? How is a segment expected to be different in a tierBackend? (Or, should we have this discussion in the design doc?) I am fine abstracting things as long as the comments in the code are clear about what the interfaces are supposed to do, and what layout of the cluster are we coding for

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the questions correctly, lemme use an example to add some clarity:

The segments can move between tiers depends selection criteria like segment age (as said in the docs/wiki ^). The most recent segments may be kept in a tier where the servers in that tier use local disk; older segments may move to a tier where the servers use remote store to keep them. But in those tiers, the segments are preprocessed and ready for queries, just that some tier has low latency and some higher depending on servers' storage backend in those tiers.

As in a reply below, the segments in server's backend (or tier backend) are preprocessed with latest table config and schema and ready for queries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer this further, Can you clarify why we download from a "tier" (I assume this to mean a peer storage server that already hosts the segment), as opposed to always downloading from the master and keep things simple? - there is a distinction being made between the master (raw) segment and the tier (fully preprocessed segment ready to serve queries), as mentioned by @klsince above. This distinction is primarily to help us future proof the implementations of lazy loading, where using the master copy of segment may suffice.

segmentTempDir);
FileUtils.deleteDirectory(segmentTempDir);
// Remove backup dir to mark the completion of reloading for local tier backend.
LoaderUtils.removeBackup(indexDir);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep the comments and renaming code here? Moving this to a method called "removeBackup" in a different class may lead to future modifications that may not keep the semantics of rename and remove (yes, you have moved the comments as well, but I think it is better to keep it local).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.. do you mean to do renameTo outside the createBackup() and removeBackup() util methods?

I think rename, then copy or delete are pretty tightly coupled to finish one operation atomically, and I feel separating them would increase complexity for future maintenance.

btw, I extracted the two new util methods createBackup() and removeBackup() into LoaderUtils after seeing reloadFailureRecovery method already there. I kinda prefer to having those methods in one place, as they are all for the failure handling.

Copy link
Contributor

@mcvsubbu mcvsubbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing. Please hold off merge.

@klsince klsince force-pushed the decouple_segment_load_with_local_file branch from 03c234b to 8bc9da6 Compare January 5, 2022 00:33
SegmentDirectoryLoader segmentDirectoryLoader =
SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
File indexDir = getSegmentDataDir(segmentName);
segmentDirectoryLoader.download(indexDir, loaderContext);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the segment directory loader should take care of downloading from tier backend or deepstore, depending on some setting, perhaps in the loadercontext.

I am yet to understand what the difference is between a tier backend segment and the "raw" segment, though. Let me know what I am missing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preprocessed segments (or tier backend segment) are put on server's storage backend (not using local disk here on purpose) to serve queries. SegmentDirectoryLoader interface knows how to load those preprocessed segments, essentially to initialize objects like SegmentDirectory and ColumnIndexDirectory to get ready for queries. Those interfaces abstract away the impl details and also where the preprocessed segments physically are.

It's on purpose not putting logic around deep store behind SegmentDirectoryLoader interface. As deep store is not designed as server's storage backend to hold preprocessed segments. Deep store keeps the raw segments, not preprocessed with latest table config and schema for queries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the segment directory loader should take care of downloading from tier backend or deepstore, depending on some setting, perhaps in the loadercontext. - @klsince wdyt of this suggestion? Could we put download logic entirely behind the impl? Or is that hard to do because of the dependencies of downloadFromDeepstore?

Copy link
Contributor Author

@klsince klsince Jan 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedbacks, @mcvsubbu ! We've proposed a new way to refactor the segment loading logic flow like below. pls review.

public void addOrReplaceSegment() {
    segdir = loader.load();
    if (segdir == null || crc is diff) {
      download raw from deep store;
      preprocess(indexDir);
    } else if (not consistent with tbl cfg) {
      segdir.copyTo(indexDir);
      preprocess(indexDir);
    }
    if (preprocessed) {
       segdir = loader.load();
    }
    return;
  }

below is the one used in PR right now as comparison.

  public void addOrReplaceSegment() {
    segdir = loader.load();
    if (segdir == null || crc is diff) {
      download raw from deep store;
      preprocess(indexDir);
      loader.upload(index)
    } else if (not consistent with tbl cfg) {
      loader.download(indexDir);
      preprocess(indexDir);
      loader.upload(index)
    }
    if (preprocessed) {
       segdir = loader.load(); 
    }
    return;
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc does not convey any design details. It is simply a commentary on this PR. Please, include the text of the doc right here in the PR as a comment.

It is not clear (to me) what this PR is achieving. Is this a PR to just improve code readability? (you say "confusing" in your doc). Or, is this a PR to make Pinot more extensible for some purpose?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to make Pinot more extensible and future proofing for supporting cases like lazy loading and tiering. Improvement of readability is being done simply to make the review go easier, since that part of code was a lil complex to read to begin with..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored with new way proposed above. Please review the updated code. Thanks!

@klsince klsince changed the title extend SegmentDirectoryLoader interface and refactor BaseTableDataManager refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory Jan 7, 2022
@klsince klsince force-pushed the decouple_segment_load_with_local_file branch 3 times, most recently from cf77dab to a240fe4 Compare January 7, 2022 17:10
LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
FileUtils.copyDirectory(segmentBackupDir, indexDir);
LOGGER.info("Reload existing segment: {} of table: {}", segmentName, _tableNameWithType);
try (SegmentDirectory segmentDirectory = getSegmentDirectory(segmentName, indexLoadingConfig)) {
Copy link
Contributor

@mcvsubbu mcvsubbu Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use the same method as that in line 285 (getSegmentDataDir(segmentName)) ?
And also the FileUtils to copydir?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing FileUtils.copyDirectory with SegmentDirectory.copyTo() so that the physical location of the source doesn't have to be a local seg dir. The physical location becomes transparent to the logic flow here.

Think I need to rename getSegmentDirectory() to sth like loadSegmentDirectory() for clarity, as it tries to create the SegmentDirectory object. The util method getSegmentDataDir just composes the file path to seg dir.

LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
FileUtils.copyDirectory(segmentBackupDir, indexDir);
LOGGER.info("Reload existing segment: {} of table: {}", segmentName, _tableNameWithType);
try (SegmentDirectory segmentDirectory = initSegmentDirectory(segmentName, indexLoadingConfig)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be buggy. As I understand it, we seem to be copying from an empty dir to an empty dir, whereas we need to copy from the backupdir to the segment dir. Correct me if I am wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, we need to copy from backup dir, and that's implemented in SegmentLocalFSDirectory.copyTo() in this PR. That method is noop if indexDir exists already, or copies from backup dir to indexDir, keeping the original logic flow intact.

// still on the server but the metadata object has not been initialized yet. In this case,
// we should check if the segment exists on server and try to load it. If the segment does
// not exist or fails to get loaded, we download segment from deep store to load it again.
if (localMetadata == null && tryLoadExistingSegment(segmentName, indexLoadingConfig, zkMetadata)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tryLoadExistingSegment calls tryInitSegmentDirectory, which ends up loading the segment if it is locally present.
Yet, in line 510, we call load again. That seems to be a bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L510 ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema); uses the initialized SegmentDirectory object to create the ImmutableSegment object as the last step of the segment loading logic flow.

@klsince klsince requested a review from mcvsubbu January 10, 2022 20:24
@klsince klsince force-pushed the decouple_segment_load_with_local_file branch 2 times, most recently from b47697f to e801de0 Compare January 11, 2022 04:03
@klsince klsince force-pushed the decouple_segment_load_with_local_file branch from e801de0 to 638a89a Compare January 11, 2022 06:28
@npawar
Copy link
Contributor

npawar commented Jan 11, 2022

it looks like all major feedback has been addressed. if it’s only comments and naming that’s needing to change, can we merge this and add followups as needed? @snleee @mcvsubbu

Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the integration test. Thanks!

@klsince
Copy link
Contributor Author

klsince commented Jan 11, 2022

Please fix the integration test. Thanks!

Yes, I'm investigating the failure after rebasing the latest master. It's running ok on my local. So I'm using a debug PR to dig on this. That debug PR is made off the latest master and with a few prints. But as you can see the failure has moved to another integ test case there. That made me wonder if the Github CI env got some issue. Not sure if you've got permission to help take a quick look at the CI env. Thank you @snleee.

@snleee
Copy link
Contributor

snleee commented Jan 11, 2022

@klsince I see. That may be a transient issue. I re-triggered the tests https://github.com/apache/pinot/actions/runs/1681129412

Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM


// Creates the SegmentDirectory object to access the segment metadata.
// The metadata is null if the segment doesn't exist yet.
SegmentDirectory segmentDirectory = tryInitSegmentDirectory(segmentName, indexLoadingConfig);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you verify that we don't see a warning in the normal case of loading a new segment? It seems to me that the following steps will happen.

addOrReplace() -> tryLoadExistingSegment() -> tryInitSegmentDirectory() -> initSegmentDirectory()

The last method catches and exception and logs a warning.

There should not be any warnings in the normal case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right about the call stack ^ when loading a branch new segment.

When server loads a brand new segment, there is no indexDir on disk yet, so SegmentLocalFSDirectory is initialized with an empty dir (no warning emit). But with an empty SegmentLocalFSDirectory object, we get to L502/L508 to return false from tryLoadExistingSegment() method. Then, the addOrReplaceSegment() method continues to download the new segment from deep store to finish loading.

This keeps the original logic flow intact. If you check the original code, it does recoverSegmentQuietly and loadSegmentQuietly, and if they fails, then go to download segment from deep store to finish loading.

@mcvsubbu
Copy link
Contributor

Overall, I am a -1 on this PR. It has taken what was a bit complex but at least intuitive to read code to a completely confusing piece of code.

Copy link
Contributor

@mcvsubbu mcvsubbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a factory for creating the appropriate segmentDirectory may help. And/or , an interface method at SegmentDirectory level of something like: boolean isLocal()

FileUtils.deleteDirectory(segmentTempDir);
}

private boolean tryLoadExistingSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can u add a comment here stating the conditions when it returns true or false. As I understand, true means it existed locally and has been loaded into memory. false means it may exist locally with a different CRC or it may not exist at all. Is the SegmentDirectory always closed upon return?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add comment, and for the questions:

It also returns false if loading fails with exception. As before, upon exception to load locally existed segment, we should fall back to download the segment from deep store to finish the loading.

SegmentDirectory object is closed when the method has to return false.

} else {
LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
LOGGER.info("Download segment: {} of table: {} as crc changes from: {} to: {}", segmentName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this log message happen twice when crc changes? is there a way to avoid that? It is already logged in tryLoadExisting() ... method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question. Either of them gets logged in one method call, but not both. Because the two CRC checks have used CRC values from different places when comparing with the CRC in SegmentZKMetadata.

Here the localMetadata object is not null, and the CRC from it has been used to compare with the CRC in SegmentZkMetadata (L342). The tryLoadExistingSegment() method is called when localMetadata is null. And the CRC is retrieved from segment potentially exists locally (but not loaded yet) to compare with the CRC in SegmentZkMetadata.

@@ -307,6 +306,30 @@ private void reloadSegment(String tableNameWithType, SegmentMetadata segmentMeta
}
}

/**
* Try to reload a mutable segment.
* @return true if the segment is mutable; false if the segment is immutable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return true if the segment is mutable; false if the segment is immutable.
* @return true if the segment is mutable and loaded; false if the segment is immutable.

* @return true if the segment is mutable; false if the segment is immutable.
*/
@VisibleForTesting
boolean reloadMutableSegment(String tableNameWithType, String segmentName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to reloadIfMutabelSegment() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still prefer reloadMutableSegment as you suggested early on.

@mcvsubbu
Copy link
Contributor

This the best I could do as far as suggestions. Please feel free to merge after addressing my minor comments in the latest review. The comment about having a factory is for future, esp. if you are planning more changes in this area due to a non-local segment directory. If we discover issues in production, we may need to undo or re-work some of the changes. Like I mentioned in a previous comment as well, a lot of these are best hidden behind your implementation of SegmentDirectory.

@klsince
Copy link
Contributor Author

klsince commented Jan 13, 2022

I think a factory for creating the appropriate segmentDirectory may help. And/or , an interface method at SegmentDirectory level of something like: boolean isLocal()

Yes, currently, there is a factory method: SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader, which returns certain type of SegmentDirectoryLoader to initialize specific SegmentDirectory objects.

I'm not so sure about SegmentDirectory.isLocal(). This PR didn't require it, as the segment location is made transparent behind SegmentDirectory interface. It may enable other future extensions and would love to revisit it.

@klsince
Copy link
Contributor Author

klsince commented Jan 14, 2022

hi @mcvsubbu thanks a lot for the feedbacks! I've replied your latest comments. Please take a look, and if they look good, could you help lift the 'requested changes' and I'll merge it for now. Thanks!

*
* @return true if the segment still exists on server, its CRC is still same with the
* one in SegmentZKMetadata and is loaded into memory successfully; false if it doesn't
* exist on the server, its CRC has changed, or it fails to be loaded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add the segmentdirectory is closed when it is false, open otherwise

Copy link
Contributor

@mcvsubbu mcvsubbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment to be added

@npawar npawar merged commit e31e6f9 into apache:master Jan 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants