-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache HEAD API requests to object storage in the plain_rewritable disk #70915
Conversation
This is an automated comment for commit 91d7b7b with description of existing statuses. It's updated for the latest CI running ✅ Click here to open a full report in a separate page Successful checks
|
c8f2f6e
to
c8685b1
Compare
2c27c2e
to
a2859a1
Compare
src/Disks/ObjectStorages/MetadataStorageFromPlainObjectStorage.cpp
Outdated
Show resolved
Hide resolved
src/Disks/ObjectStorages/MetadataStorageFromPlainObjectStorage.cpp
Outdated
Show resolved
Hide resolved
src/Disks/ObjectStorages/MetadataStorageFromPlainRewritableObjectStorage.cpp
Outdated
Show resolved
Hide resolved
- Cache last modified metadata object in addition to file size. - Use metadata object cache to determine if a file exists. - Evict from cache when metadata is unlinked.
a2859a1
to
2f533cb
Compare
2f533cb
to
2e38791
Compare
@@ -13,21 +13,28 @@ class MetadataStorageFromPlainRewritableObjectStorage final : public MetadataSto | |||
{ | |||
private: | |||
const std::string metadata_key_prefix; | |||
std::shared_ptr<InMemoryPathMap> path_map; | |||
std::shared_ptr<InMemoryDirectoryPathMap> path_map; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rename path_map
as well, to something like directory_path_map
, or directory_to_remote_info_map
, or something else...
auto object_key = object_storage->generateObjectKeyForPath(path, std::nullopt /* key_prefix */); | ||
if (auto metadata = object_storage->tryGetObjectMetadata(object_key.serialize())) | ||
return std::make_shared<ObjectMetadataEntry>(metadata->size_bytes, metadata->last_modified.epochTime()); | ||
return nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we once failed to get object metadata, for example, because s3 was temporarily unavailable, then looks like we will add nullptr to cache for this path and it would always be nullptr and never updated, should we instead check the error code because of which we failed to get metadata, and if it is key_doesnt_exist, then probably ok to leave nullptr in cache, but for another reason seems better not to cache it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed so null pointer is not cached.
I think storing a nullptr for non-existed key would be wrong in the present implementation, since I don't update the cache when a file is written.
Remote paths do not change.
…89e457b9669b7a9e9084987ca144b Cherry pick #70915 to 24.8: Cache HEAD API requests to object storage in the plain_rewritable disk
… the plain_rewritable disk
…89e457b9669b7a9e9084987ca144b Cherry pick #70915 to 24.9: Cache HEAD API requests to object storage in the plain_rewritable disk
… the plain_rewritable disk
… the plain_rewritable disk
… the plain_rewritable disk
Backport #70915 to 24.8: Cache HEAD API requests to object storage in the plain_rewritable disk
Backport #70915 to 24.9: Cache HEAD API requests to object storage in the plain_rewritable disk
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Reduce the number of object storage HEAD API requests in the plain_rewritable disk.
Documentation entry for user-facing changes
CI Settings (Only check the boxes if you know what you are doing):