-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC: Error status and object is not found but job is running ? #14966
Comments
can you share the job service log as well? It's helpful for narrow down the problem. And, can you confirm whether the other jobs(Replication/Scan or Retention) are running at the time of GC execution? We found that the job log may fail to create at the some particular case, like job service high load scenario. |
Sure. However, after a fail, if I don't restart Harbor (job service pod), I'm having this response requesting a log of any garbage collection (even on which failed multiple days ago and a restart already happened):
Storage is on S3. I have many job service logs with Here is an example of a job service. Currently, I was never able to run a complete GC. I start GC, it fails few hours later, I restart job service and restart GC, it fails ... |
I got an other error retrieving log after job failed: Maybe finally related to #12948 ? |
|
You can upgrade to latest Harbor, it introduces retry on GC failure within one minute. |
Thanks @wy65701436 I will try that :) |
@guyguy333 please reopen it if you still encounter the problem. |
Hi, I have upgrade Harbor to 2.3.3 and i have the same error:
Jobserver.log in time range:
|
We are having the same issue in 2.6.2.
@wy65701436 or @steven-zou, can you please re-open? Or shall I open a new one? What other information do you want me to provide? |
Same symptoms. I use Ceph Radosgw as an S3 storage. I have 2.66 TB overall registry space used (and growing). Not a single job (weekly because of #14774) didn't succeed since September 2022. P.S. jobservice pod CPU consumption is close to zero and pod stdout does not contain anything worthy of mentioning. Could it be it's doing nothing? Just observed jobService dashboard: all jobs are in Pending state. There is one worker with concurrency 100 and no one got any jobs. |
Removing all Redis instances (KeyDB in my case) with all the PVCs seems did the trick and now one worker is doing GC after a manual restart. |
My harbor version is 2.2.0,have same issue. |
Hi Team - What is the log entity? How can it help me locate the root cause? |
Seeing this with v2.11.0 – should this be reopened or shall I create a new issue? |
Thx for following up. We are just now installing 2.11. I would say leave closed and we'll open new issues as we encounter them with 2.11. |
I am running with the version
Job status is Running and from logs of
|
Expected behavior and actual behavior:
As long as GC job is running, I shouldn't have error status. However, I'm pretty sure GC job is running as registry is spammed with DELETE. Moreover, I'm unable to fetch the job log so I believe job is not done.
Here is the output clicking on Log:
Steps to reproduce the problem:
Run a long GC job (We've 1.5Tb and ~175k objects on S3)
Versions:
Please specify the versions of following systems.
Additional context:
harbor.yml
and files in the same directory, including subdirectory./var/log/harbor/
.The text was updated successfully, but these errors were encountered: