Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage/endpoint: Add a new component GCStateStorage and GCStateProvider to storage/endpoints as the new storage layer for GC related metadata #9109

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

MyonKeminta
Copy link
Contributor

@MyonKeminta MyonKeminta commented Feb 27, 2025

What problem does this PR solve?

Issue Number: Ref #8978

What is changed and how does it work?

Add a new component `GCStateStorage` and `GCStateProvider` to storage/endpoints as the new storage layer for GC related metadata.
This is part of the GC Interface Refactor task. The `GCSafePointStorage` is planned to be deprecated, and the GCStateStorage will be the replacement. It's currently not used, and will be used when we refactor the upper layer (`GCSafePointManager`->`GCStateManager`, the gRPC layer, etc.).

Check List

Tests

  • Unit test

Code changes

Side effects

  • Possible performance regression
  • Increased code complexity

Related changes

Release note

None.

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. labels Feb 27, 2025
Copy link
Contributor

ti-chi-bot bot commented Feb 27, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign nolouch for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 27, 2025
@MyonKeminta MyonKeminta mentioned this pull request Feb 27, 2025
13 tasks
Signed-off-by: MyonKeminta <[email protected]>
Copy link

codecov bot commented Feb 27, 2025

Codecov Report

Attention: Patch coverage is 79.68338% with 77 lines in your changes missing coverage. Please review.

Project coverage is 76.37%. Comparing base (027c0d7) to head (c5653a2).
Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9109      +/-   ##
==========================================
+ Coverage   76.29%   76.37%   +0.07%     
==========================================
  Files         473      474       +1     
  Lines       71663    72034     +371     
==========================================
+ Hits        54677    55016     +339     
- Misses      13559    13578      +19     
- Partials     3427     3440      +13     
Flag Coverage Δ
unittests 76.37% <79.68%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
@MyonKeminta
Copy link
Contributor Author

/retest

"github.com/tikv/pd/pkg/utils/keypath"
)

// ServiceSafePoint is the safepoint for a specific service
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding more comments to explain which component or feature will use ServiceSafePoint/GCBarrier/TxnSafePoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add comments to the start of this file to describe the related concepts. But ths interfaces in this file is only expected to be used in GCStateManager, which will come in another PR.


// keyspaceGCSafePoint is the data structure when writing
type keyspaceGCSafePoint struct {
KeyspaceID uint32 `json:"keyspace_id"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the ID used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing but for keeping the data format compatible with the exisiting design of the "GC API V2". I forgot to finish the comments of this type. I'll fix it.

}

// loadJSON loads a specific key from the StorageEndpoint, and parses it as JSON into type T.
func loadJSON[T any](se *StorageEndpoint, key string) (T, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we move loadJSON and loadJSONByPrefix to endpoint.go?

// In the transaction, reads can be performed on the GCStateProvider as usual, while writes should only be performed
// through the GCStateWriteBatch.
func (p GCStateProvider) RunInGCStateTransaction(f func(wb *GCStateWriteBatch) error) error {
revisionKey := keypath.GCStateRevisionPath()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use the revision provided by etcd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but hard to provide it over the abstracted kv.Base interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it needs a single centrialized key anyway.

txn := p.storage.CreateRawTxn()
result, err := txn.If(condition).Then(ops...).Commit()
if err != nil {
return errors.AddStack(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about return errs.ErrEtcdTxnInternal.Wrap(err).GenWithStackByArgs()

return errs.ErrEtcdTxnConflict.GenWithStackByArgs()
}

if len(ops) != len(result.Responses) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this happens, do we need to rollback or manually fix it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understanding is correct, etcd shoudl guarantee this to be true. I expect that this path is never reachable. If this happen, I think it should be a bug in etcd or etcd-client and it doesn't look possible to automatically fix it in a simple way. And as the Commit returns no error here, it seems we no longer have chance to roll it back atomically. I think if this does happen, we may have no choice but to manually recover it.

Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
@MyonKeminta
Copy link
Contributor Author

/retest

1 similar comment
@MyonKeminta
Copy link
Contributor Author

/retest

@@ -0,0 +1,541 @@
// Copyright 2024 TiKV Project Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Copyright 2024 TiKV Project Authors.
// Copyright 2025 TiKV Project Authors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time went too fast before I noticed it😕

gcSafePointV2PathFormat = "/pd/%d/keyspaces/gc_safe_point/%08d" // "/pd/{cluster_id}/keyspaces/gc_safe_point/{keyspace_id}"
serviceSafePointV2PathFormat = "/pd/%d/keyspaces/service_safe_point/%08d/%s" // "/pd/{cluster_id}/keyspaces/service_safe_point/{keyspace_id}/{service_id}"
gcStateRevisionPathFormat = "/pd/%d/gc/gc_state_revision" // "/pd/{cluster_id}/gc/gc_state_revision"
gcSafePointPathFormat = "/pd/%d/gc/safe_point" // "/pd/{cluster_id}/gc/safe_point"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about naming this variable globalGCSafePointPathFormat directly? Other developers might have questions about the relationship between gcSafePointPathFormat and keyspaceGCSafePointPathFormat: keyspaceGCSafePointPathFormat.

return fmt.Sprintf(gcStateRevisionPathFormat, ClusterID())
}

// GCSafePointPath returns the key path of the global (NullKeyspace) GC safe point.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about explicitly introducing and explaining the relationship between global and NullKeyspace in the code comments? This can help everyone understand its meaning clearly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has signed the dco. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants