-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add proposal of non-blocking GC #133
Conversation
Signed-off-by: wang yan <[email protected]>
Deletes the marked image layers. | ||
|
||
### Mark | ||
Bases on the Harbor DB, we can count each blob/manifest's reference count, and select the reference count 0 as the candidate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before Mark, there is an optional step to remove the untagged artifacts. Could you also consider that step in the proposal?
Will there be any race-condition in that step?
If the answer is no, could you explain in the proposal why not?
proposals/new/Non-Blocking-GC.md
Outdated
The delete candidate excludes all of blobs that in the project & blob. | ||
|
||
1. candidate set 1 -- all blobs from table blob exclude the items in the table project & blob. | ||
2. candidate set 1 excludes all of referenced blobs (artifact -> artifact & blob -> blob). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each artifact must belong to a project, so then why there are "referenced blobs" not in "project blobs"?
This is not strictly in the scope of the non-blocking GC but looks like there's redundancy in the schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, the step 1 is enough for Mark, all of referenced blobs are in table project&blob.
But add step 2 is just double guarantee that any of referenced blob by artifact will not be in the candidate.
The manifest or blob will only be removed from the deletion candidates, and it can be GCed in the next execution. | ||
|
||
### Delete Blob & Manifest | ||
We'd like to enable the registry controller to have the capability to delete blob & manifest via digest by leveraging the distribution code as library. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it only needs to support deleting blobs? Do we need to treat manifest any differently?
proposals/new/Non-Blocking-GC.md
Outdated
|
||
**DOUBLE GUARANTEE** | ||
|
||
We need to introduce the cutoff time, any to be deleted blob & manifest, the update time must not be later than the cutoff time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there update time of a blob in DB?
The basic flow is: | ||
|
||
* Mark the GC candidates in Harbor Core | ||
* Trigger a GC job and pass the candidates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to do the marking in GC job?
In GC job you will remove untagged artifacts, and that will increate the set of candidates?
![mark_uploading](../images/non-blocking-gc/mark_uploading.png) | ||
|
||
### Sweep | ||
The registry controller will grant the capability of deleting blob & manifest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite follow the questions and solutions in this section.
Generally I don't quite understand the deleting and delete status, could you explain what are they and how do you check it?
And I'm not very sure it's safe to assume the HEAD
request. What if there's a 3rd party client that doesn't do the HEAD before pushing?
#### Question 4, what about if client only sends head request, and no put following. | ||
The manifest or blob will only be removed from the deletion candidates, and it can be GCed in the next execution. | ||
|
||
### Delete Blob & Manifest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, there can race condition two processes (registry/ registry ctl) calling the storage API to handle same blob, do we assume the storage should handle such situation?
Let's find some proof this assumption is reliable, and reference it in this proposal.
We also need to clarify how do we handle the errors returned by storage service in such race condition.
Signed-off-by: wang yan <[email protected]>
Signed-off-by: wang yan <[email protected]>
Signed-off-by: wang yan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed through online review meetings and I think it's ok now.
|
||
|
||
#### Question 1, how to deal with the uploading blobs at the phase of marking. | ||
We do have a table to record the uploading blobs info, that's project & blob. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it makes things too heavy. and concurrent risk still exists.
it brings complex when you decide to treat Image GC as a immediate job. Nobody can avoid 'Stop The World'. Maybe we can treat it as an offline job and users only want to delete the blobs that has been stale for a long time but not the blobs in active. |
According to the lazy consensus principle, merge this proposal PR. The feature will be delivered in V2.1. |
Signed-off-by: wang yan [email protected]